User Note - Sample Design
NHIS Sample Design
The National Health Interview Survey (NHIS) is a household, face-to-face health survey of approximately 87,500 people in 35,000 households each year. The NHIS sample is designed to be representative of the civilian, non-institutionalized population living in the United States. By definition, the NHIS excludes residents in long-term care facilities, persons in correctional facilities, and U.S. nationals living abroad. Households comprised entirely of active-duty Armed Forces personnel are also excluded from the NHIS. The NHIS has been fielded annually since 1957, making it the longest-running national health survey in the United States. Data collection is carried out continuously throughout the year, producing nationally representative samples each quarter and attenuating any potential seasonal biases.
The NHIS is a complex, multistage probability sample that incorporates stratification and clustering. The current sample design selects clusters of households and non-institutional group quarters (such as college dormitories) that are nested within primary sampling units (PSUs). A PSU can consist of a county, a small group of adjacent counties, or a metropolitan statistical area. At intervals of approximately 10 years, the NHIS undergoes a sample redesign. Each new design period represents either an update to the previous design or a complete redesign. While each design period incorporates new information based on the most recent decennial census data, new geographic designations (MSA definitions), or new sampling procedures to improve efficiency, the general design of the NHIS has remained the same over time.
The current design period, implemented in 2016, resembles the 2006-2015 design period with some notable exceptions. First, the current design period eliminates most oversampling by race and ethnicity. Previously, selected race and ethnic groups were oversampled at the household level (blacks in 1985-2015, Hispanics in 1995-2015, and Asians in 2006-2015). The oversampling of black, Hispanic, and Asian sample adults aged 65 and older has been retained under the current sample design. Second, the procedure used to select Primary Sampling Units (PSUs), described below, has changed. Third, the source of addresses from which the sample is drawn has changed, after remaining the same for the past three sample design periods. In these previous design periods, addresses were sampled from lists of addresses generated by field listing operations. Under the current sample design, the primary source of addresses is a periodically-updated commercial address list, supplemented with address lists created by a limited field listing operation carried out only in select areas. Additionally, there is a separate sampling mechanism for college dormitories. The sharp increase in 2016 in the number of sampled housing units characterized as "student quarters in college dormitory" (refer to LIVINGQTR) is likely associated with the introduction of the separate sample listing for college dormitories. Per correspondence with NCHS staff, however, this sampling mechanism had a low response rate and, as a consequence, was discontinued at the end of 2017.
The first stage of sampling involves dividing the U.S. into approximately 1,700 geographically defined PSUs. Under the current sampling design, large clusters of addresses across all of the PSUs within a state are sampled. The survey is then fielded in the PSUs associated with the sampled addresses. As in earlier design periods, large metropolitan areas are selected with certainty into the sample and are called self-representing PSUs. If a state contains both self-representing and non-self-representing PSUs, address clusters from each type of PSU were sampled independently. All households are then assigned a quarter (of the year) for interview and are subsequently distributed across the 3 months of each quarter. All eligible members of these sampled housing units are invited to participate in the basic survey interview. Some household members additionally participate in supplemental interviews, either through random selection or based on answers to questions in the basic/core interview.
The sampling procedure described above differs from the most recent sampling design periods: in the 1997-2005 and 2006-2015 design periods, PSUs were grouped into strata using social and demographic characteristics of the area. Depending on the year, one or more PSUs were sampled per stratum, with the probability of selection for each PSU being proportional to its population size (PPS) within strata. In the second stage of sampling, a selection of geographic area segments is sampled from within each PSU. These segments were then subdivided into clusters, each of which contained a small number (approximately 4-9) housing units.
Along with sample redesign, the NHIS underwent a major survey redesign in 1997. One feature of this survey redesign merits emphasis: the random selection of a single adult and a single child to answer a battery of additional questions. While sampling one adult was a common practice in earlier supplements, earlier waves of the core questionnaire collected full information on all household members. After 1997, interviewers continued to collect information on the household, socio-demographic characteristics, and basic indicators of health status, disability, and utilization of health care services for all persons. However, to reduce interview length and biases from proxy reporting, a new sampling scheme for the Basic Module questionnaire was adopted, in which more extensive information was collected on one randomly selected sample adult and one sample child from each family. These questions appear in the sample adult and sample child files in the original NHIS public use files. In IPUMS NHIS, for variables based on questions asked of sample adults and/or sample children, the universe statement in the variable description refers to "sample adults" and/or "sample children" rather than "persons."
Data collected during the 1985-2015 period contain oversamples of selected racial and ethnic groups. Beginning with the 1985 sample redesign, the NHIS included an oversample of the black population to increase the reliability of estimates for this group. In 1995, the NHIS also implemented an oversample of Hispanics. From 2006-2015, the NHIS also included an oversample of the Asian population. Due to this oversampling, many more blacks, Hispanics, and Asians were interviewed than would be if the sample were exactly proportional to the U.S. population. Thus, each person from these oversampled groups represented a smaller number of individuals than do other persons in the sample. The use of SAMPLING WEIGHTS, discussed in another user note, corrects for this oversampling to yield representative population estimates. The mechanism to oversample black, Hispanic, and Asian populations was discontinued with the new NHIS sample design implemented in 2016.
For additional information on each of the NHIS redesigns, users can access original NCHS documentation through links provided below.
National Center for Health Statistics. (1975). Health Interview Survey Procedure 1957-1974. Vital Health Stat, 1(11).
National Center for Health Statistics. (1985). The National Health Interview Survey Design, 1973–84, and Procedures, 1975-83. Vital Health Stat, 1(18).
National Center for Health Statistics. (1989). Design and estimation for the National Health Interview Survey, 1985-94. Vital Health Stat, 2(110).
National Center for Health Statistics. (1999). National Health Interview Survey: Research for the 1995–2004 redesign. Vital Health Stat, 2(126).
National Center for Health Statistics. (2000). Design and estimation for the National Health Interview Survey, 1995–2004. Vital Health Stat, 2(130).
National Center for Health Statistics. (2014). Design and Estimation for the National Health Interview Survey, 2006-2015. Vital Health Stat, 2(165).
National Center for Health Statistics. (2017). Survey Description, National Health Interview Survey, 2016. Hyattsville, MD.