Upsala J Med Sci 92: 193-203, 1987 Prevalence of Sleep Apnea S yndrome-Estimation by Two Stage Sampling Thorarinn Gislason’ and Adam Taube’ ‘Department of Lung Medicine, Akademiska Sjukhuset and ’Department of Statistics, Uppsala University, Uppsala, Sweden This article describes stepwise the methodological and statistical considerations made in the planning of an epidemiological survey of the prevalence of the sleep apnea syndrome (SAS) in the municipality of Uppsala in Sweden. polysomnographic studies are required for an unequivocal diagnosis of SAS. was decided to investigate men 30 to 69 years old. of taking a simple random sample (SRS) was considered, but statistical calcula- tions showed that for prevalences between 1-3% this would lead to totally unacceptable results. The investigation had to be confined to 60 subjects, since all-night It Initially, the possibility A postal questionnaire, sent to the total population of 35 779 men in this age group, was then considered and, depending on their replies, they would be divided into low-risk and high-risk stratums of SAS. then be called from each group for polysomnographic studies. impossible, as the lowest possible standard error was still too large and the samples would contain unacceptably few cases of SAS. We therefore decided to concentrate on the highrisk stratum, obtaining an estimated under limit of the prevalence. Optimal numbers would This also proved For economical reasons, we could not send a questionnaire to all the 35 779 individuals, but based the investigation on a SRS of 4 OOO men. post-stratified in a high-risk and a low-risk group. From the high-risk group. 60 men were then selected for polysomnographic studies. 193 INTRODUCTION More often than not, medical research reports deal with results obtained from a group of individuals who, by some means, have been selected from a larger population. selection procedure are rarely stated. By tradition it seems, planning discussions are not desired in written reports. Since the statistical conside- rations in the planning of this survey led to quite a different procedure than we had anticipated, we think it might be of interest to researchers in similar situations if we reviewed the considerations step by step. The reasons for choosing a particular sampling strategy or The occurrence of sleep apnea with concomitant owgen desaturation has been reported frequently in the last two decades (5, 9, 14). Loud and disturbing snoring often associated with excessive daytime sleepiness is the main clinical feature, but numerous reports have also described sleep apnea in combination with various diseases (9, 14). as the occurrence of at least thirty apneic episodes during seven hours of steep (9). diagnosis (9, 10). The sleep apnea syndrome (SAS) has been defined A whole-night polysomnographic study is necessary to confirm the Table 1 - Population, sample size and criteria in previous studies on the prevalence of SAS REFERENCE POPULATION SAMPLE SIZE (n) AND CRITERIA Bixler et a1 ( I ) Volunteers, students, n = 100 - without complaints technical staff and friends of sleep disorders ~ ~~~~~ Block et a1 (2) Medical and nursing staff n = 49 - no breathing + patients complaints Kreis et a1 (11) Patients from a general n = 26 - not medically medical service unstable, demented or expecteq to be discharged within 3 day: n = 87 - selected on the basil (7) admitted to S . Raffaele of questionnaires and clinica Franceschi et a1 All patients (N=2518) hospital during 1 year data Lavie (12) Industrial workers n = 78 - males, selected on (N=1502) the basis of questionnaires Carskadon et a1 Elderly volunteers from n = 40 - without serious (3) non-medical sources medical disorders - not complaining spontaneously of sleep problems 194 As is shown in Table 1, previous investigations into the prevalences of SAS have been based on - presumably healthy and selected populations (1. 2). - in-patient populations (7, 11). - a healthy, working male population (12). or - apparently healthy, elderly individuals ( 3 ) . Both the designs and the results of these studies have shown great variation (Table 1). a cross section of a general population. Instead, they consist of special categories of individuals who at the time happened to be available for investi- gation. In only two studies (7,12) are there defined selection procedures for the individuals, i.e. they are from specified background populations. Lavie’s study (12) is the only one in which statistical considerations have been presented. We notice that none of the investigated groups constitute The aim of this paper is to describe the methodological background in an epidemiological study of the prevalence of SAS in a defined population (8). One of our goals was to estimate the prevalence of SAS with reasonable precision and another was to investigate a comparatively large group of patients with the disease. As will be shown, when resources are limited, these two goals are difficult to reach in one and the same study. We thus had to lower our ambitions. Rather than obtaining an estimate of the prevalence of the disease, a procedure was worked out enabling an estimation of the lower limit of the prevalence. 1. LIMITATIONS OF THE POPULATION AND NUMBER OF INVESTIGATED PERSONS In almost all studies on SAS this syndrome has been found to be much more common among men than among women and also to occur more frequently in older age groups (3, 6, 14). We chose to study only men, aged 30 - 69 years. The survey was to be carried out in the municipality of Uppsala. Sweden (total population 150 579). which had a population of 35 779 men aged 30 - 69 years at the time of commencement of the study, i.e. 1984. We want to estimate the prevalence of SAS in the group, i.e No. of SAS cases No. of individuals in the population (at the time of the study) P = 13-878572 195 For economic and technical reasons, the clinical part of the investigation had to be limited from the beginning to 60 subjects, since whole-night poly-, somnographic studies are required for diagnosis. lay in the selection of the 60 persons to be investigated so that the results would be as statistically useful as possible. The methodological problem 2. IS A SIMPLE RANDOM SAMF'LE (SRS) SATISFACTORY? Suppose we draw a SRS of n individuals from a population. The parameter P is then estimated by No. of SAS cases in sample - psRS - No. of individuals in sample The estimate psRS of P has the standard error (SE) 1 if the population is large . It is thus possible to calculate the magnitude of the standard error for different sizes of n and P. If we assume that the prevalence of SAS is between one and three percent, the relation between the number of men studied, n and SE(p), has the shape displayed in Figure 1. Thus if, for example, the true value of P is 0.03 and we require SE(p) 0.01, it means that n must be at least >300. Or if P = 0.01 and we require SE(p) 10.0033, we must have n = 910. If SRS is used, we will therefore need between 300 and 900 observations. By selecting only 60 men by means of a simple random sample (SRS) from a population, it is even possible that the sample will not contain one single case of SAS, as the chance of this is (l-P)60, which for P = 0.02 is 0.29. If we obtained the result that none of the 60 polysomnographic investigations revealed a case of SAS, the equation (l-P)60 = 0.05 would give P = 0.049 as an upper 95% confidence limit for the prevalence P. This would not permit any 'Since the population investigated is large and the number of observations very small, the so-called finite population correction is neglected in all formulae (4). 196 0.00 1 I I I I I I I I I I 0 100 200 300 400 500 600 700 800 900 1 O O O n Figure 1 - The standard error of psRs as a function of the number of observations, n. conclusion to be drawn and the appearance of one or two cases of SAS would only increase the previous confusion. unfeasible for prevalences of SAS lying between 1-3%. Thus this simple procedure was found to be In order to obtain a reasonable degree of precision within the limited economic and practical frame, further information about the population had to be utilized in the design. 3. STRATIFICATION BY MEANS OF SCREENING QUESTIONNAIRES We considered the possibility of mailing a questionnaire concerning sleep complaints to all the 35 779 men in the population. before a sample was selected, the population could be divided into. say, two strata - one small stratum in which SAS was highly suspected and one larger one with presumably very few cases of SAS. The system of notations is presented in Table 2. This would mean that 197 Table 2 - Notations concerning population and sample POPULATION SAMPLE INDIVIDUALS OF SAS INDIVIDUALS OF SAS STRATUM NUMBER OF PREVALENCE NUMBER OF PROPORTION NL pL "L PL High-risk NH pH "H PH Low-risk - TOTAL N P n The prevalence of SAS, the parameter P, can be written as a weighted average NH pH PL + - NL p = - N N On the basis of the two samples, with nL and % observations respect- ively, this parameter is estimated by means of the estimator NH PH PL + - NL - - - N N Pstrat with the standard error It is well known (4) that SE (pstrat) is at a minimum when and it can be shown that SE (pstrat) < SE (psm) when 198 "H Even if PH and PL are unknown, there are good chances of selecting so that stratified sampling will give a better estimate than SRS. and nL Let us study in more detail how this approach works in a hypothetical example (which numerically is very similar to our real situation). that N = 32 000, NL = 28 OOO, NH = 4 OOO, PL = 0.005, and PH = 0.125, which means that P = 0.02. This example might perhaps be considered somewhat optimistic concerning the screening efficiency. We assume 0.02 0.02 0.01 0.01 0.oc 0.oc I I I I I I 10 20 30 40 5 0 6 0 n L L I I I I 1 I n H 60 5 0 40 30 20 10 0 Figure 2 - The standard error of p for different sizes of nL and strat L nH with P=O.O2 and n + % = 60. In Figure 2 we find SE (pstrat) as a function of the allocation of observations to the two strata. when (nL = 13 ; nH = 47) 13 < nL < 53 and nH = 60 -nL, the stratification decreases the standard error of the estimate when compared with It can be seen that SE (pstrat) = SE (pStrat) and when (y = 52 ; % = 8 ) . For all cases where SE (psRs). The best possible choice, 199 i.e. when SE (pstrat) (nL = 36. N modest decrease (27%) in the standard error (Figure 2). Furthermore, even this allocation of the samples with the hypothetical prevalences stated will result in an unacceptably small number of SAS cases (3 to 4) for further study. Thus, within the economic frame of 60 observations we had to abandon our ambition of estimating the parameter P by means of sampling from both strata. is lowest - 24), gives however only a H - If it had been economically possible to make, say, 250 polysomnographic studies from the said population, the above approach would have given SE (Pstrat could be expected. An SRS of n = 250 observations would have given SE (pm) = 0.0089. In order to halve 1 OOO polysomnographic studies, giving SE (pm) = 0.0044. ) = 0.0066 for nL = % = 125. Among these, only some 16 SAS cases SE (p) SE (pstrat) = 0.0033. we would have had t o perform as against 4. A LIMITED STUDY OF THE HIGH RISK GROUP ONLY Since we have it must be true that NL NH pL + N pH p = - N NH pH P2- N with an equality sign for the case when the stratification is 100% effective in the sense that all SAS cases are to be found in the high risk stratum. We will therefore concentrate our resources on esimating the expression - alone, which means that we draw all our observations from the high risk NH pH N stratum. Thus the estimator NH PH Pu = - N estimates a lower limit for the parameter P . The standard error of this estimator is 200 Suppose the screening procedure which separates the subjects into a high- risk and a low-risk stratum is a test with a sensitivity clear that s (13). It is then Thus if for example the sensitivity is 100% (i.e. that all SAS cases are i n the high risk stratum), our estimator p will estimate the true prevalence P. U In the previously mentioned hypothetical example, the selected prevalence was P = 0.02, but the expected value of p will be somewhat lower, namely U 4 000 N pH 32 OOO 0.125 = 0.0156 - - - NH - will have the standard error SE (p ) = 0.0056. PU U The estimator Since the total number of SAS-cases in the said population was assumed to be 32 000 0.02 = 640 and thereof 4 OOO 0.125 in the high-risk stratum, the sensitivity of the screening procedure is 5001640 = 0.78. Therefore, we could also write that 500 640 s . p = - 0.02 = 0.0156 By ignoring the low-risk stratum. we will thus be estimating a lower limit, which in this example is about 78% of the true SAS prevalence. 5. A DOUBLE-SAMPLING PROCEDURE Since the cost of each questionnaire was about US$ 1.50, it was considered too expensive to send it to each one of the 35 779 men in the population. was therefore decided to first take a large SRS of n' = 4 OOO men, who would It 201 receive a postal questionnaire. This sample would subsequently be stratified (so-called poststratification) (4) on the basis of the answers given, in a ' high-risk group and a low-risk group with ";I and ni persons respectively. The proportion NH/N can thus be estimated by "1;/n'. From the "1; individuals in the high-risk group, we then take a sub-sample pH of n = 60 persons, in order to obtain an estimate of the SAS-prevalence in the high-risk stratum. We thus get the estimator "I; p; = - P" n' Since n' would be as large as 4 OOO, the random variation in the ratio will be of negligible order of magnitude and the standard error of the ";Ih estimated under limit will be This means that the data will be analyzed as if the stratification had been made a priori. CONCLUSION The methodological examples given above clearly show the importance of analysing in advance the method to be used in epidemiological studies. Materials that are too small cannot elucidate the question of the prevalence of SAS because of their statistical and methodological limitations. In our case financial limitations made it necessary to look for the minimal frequency of SAS by investigating only those persons who were most suspected of having this condition from the answers to sleep questionnaires. This work was supported by grants from the National Swedish Association against Heart and Chest Diseases, Stockholm. 202 REFERENCES 1. 2. 3. 4. 5. 6. 7. 8 . 9. 10. 11. 12. 13. 14. Bixler. E.O., Kales, A., Soldatos, C.R. & Bueno, A.V.: Sleep apneic activity in a normal population. Res Com in Chem Path and Pharm 36(1):141-152. 1982. Block. A.J., Boysen, P.G., Wynne, J.W., et al.: Sleep apnea. hypopnea and oxygen desaturation in normal subjects - A strong male dominance. N Engl J Med 300:513-517, 1979. Carskadon, M.A. &Dement, W.C.: Respiration during sleep in the aged human. J' Gerontol 36:420-423. 1981. Cochran, William G.: Sampling techniques. 3rd ed. Chapters 2 and 5. Wiley, N Y, 1977. Coleman, R.M., Roffwarg, H . P . , Kennedy, S.J., et al.: Sleep-wake disorders based on a polysomnographic diagnosis - A national cooperative study. JAMA 247:997-1003, 1982. Coleman, R.M., Miles, L.E., Guilleminault, C., et al.: Sleep-wake disorders in the elderly. Am Ger SOC 19(7):289-296, 1981. Franceschi. M., Zamproni, P.. Crippa, D. & Smirne. S.: Excessive daytime sleepiness: tion. Sleep 5(3):239-247, 1982. Gislason, T.. et al.: Prevalence of sleep apnea syndrome among Swedish men - An epidemiological study. Guilleminault. C., Cumminsky, J. &Dement, W.C.: Sleep apnea syndromes: Recent advance. Avd Intern Med 26:347-374, 1980. A 1-year study in an unselected inpatient popula- (In manuscript). Haponik. E.F.. Smith, P.L. & Meyers, D.A.: Evaluation of sleep- disordered breathing - Is polysomnography necessary? Am J Med 77:671-677, 1984. Kreis..P., Kripke, D.F. & Anocli-Israel. S.: Sleep apnea: A prospective study. The Western Journal of Medicine 1399:171-173, 1983. Lavie, P.: Incidence of sleep apnea in a presumably healthy working population. Sleep 6(4):312-318, 1983. Lilienfeld, M. & Lilienfeld, E.: Foundation of epidemiology. 2nd ed. Chapter 6, p 151. Oxford University Press, 1980. Weitzman, E.D.: Sleep and its disorders. Ann Rev Neurosci 4:381-417, 1981. Address f o r reprints: Thorarinn Gislason Department of Lung Medicine Adademiska Sjukhuset S-751 85 Uppsala, Sweden 14-818512 203