User Note - Sample Design

NHIS Sample Design

The National Health Interview Survey (NHIS) is a household, face-to-face health survey of approximately 36,000 people in 35,000 households each year. The NHIS sample is designed to be representative of the civilian, non-institutionalized population living in the 50 states and District of Columbia at the time of the survey. By definition, the NHIS excludes residents in long-term care facilities, persons in correctional facilities, and U.S. nationals living abroad. Households comprised entirely of active-duty Armed Forces personnel are also excluded from the NHIS. The NHIS has been fielded annually since 1957, making it the longest-running national health survey in the United States. Data collection is carried out continuously throughout the year, producing nationally representative samples each month and attenuating any potential seasonal biases.

The NHIS is a complex, multistage probability sample that incorporates stratification and clustering. The current sample design selects clusters of households and non-institutional group quarters (such as college dormitories) that are nested within primary sampling units (PSUs). A PSU can consist of a county, a small group of adjacent counties, or a metropolitan statistical area. At intervals of approximately 10 years, the NHIS undergoes a sample redesign. Each new design period represents either an update to the previous design or a complete redesign. While each design period incorporates new information based on the most recent decennial census data, new geographic designations (MSA definitions), or new sampling procedures to improve efficiency, the general design of the NHIS has remained the same over time.

The current design period, implemented in 2016, resembles the 2006-2015 design period with some notable exceptions. First, the current design period eliminates most oversampling by race and ethnicity. Previously, selected race and ethnic groups were oversampled at the household level (blacks in 1985-2015, Hispanics in 1995-2015, and Asians in 2006-2015). The oversampling of black, Hispanic, and Asian adults aged 65 and older as the sample adult briefly continued after the household-level oversampling stopped, but was discontinued after 2018. Second, the procedure used to select Primary Sampling Units (PSUs), described below, has changed. Third, the source of addresses from which the sample is drawn has changed, after remaining the same for the past three sample design periods. In these previous design periods, addresses were sampled from lists of addresses generated by field listing operations. Under the current sample design, the primary source of addresses is a periodically-updated commercial address list (producing the "unit frame"), supplemented with address lists created by a limited field listing operation carried out only in select areas (producing a non-overlapping "area frame"). Approximately 11% of counties in the sample were part of the area frame generated by the field listing operation. Additionally, a separate sampling mechanism for college dormitories was briefly introduced. The sharp increase in 2016 in the number of sampled housing units characterized as "student quarters in college dormitory" (refer to LIVINGQTR) is likely associated with the introduction of the separate sample listing for college dormitories. Per correspondence with NCHS staff, however, this sampling mechanism had a low response rate and, as a consequence, was discontinued at the end of 2017 and college students residing in dormitories were instead eligible to be sampled at their primary, non-college residence via the household roster. Last, unlike previous sampling designs, the current sampling designs sample the address clusters located within PSUs, determining which geographic areas would be included in the samples. Previously, geographic areas were directly sampled.

The NHIS sampling involves four major steps. First, the U.S. is divided into 1,689 geographically defined PSUs. PSUs are defined as mostly geographically contiguous counties, county equivalents, or groups of counties that do not cross state boundaries. The second step involves, for some states, dividing the PSUs into two strata defined by population density, generally urban and rural counties. All other states contain only one stratum. The third step involves defining clusters of approximately 2,500 addresses within each stratum, where each address cluster is located entirely within one of the originally defined 1,689 PSUs. Fourth, a specific number of address clusters in each stratum is systematically selected for the NHIS sample.

The sampling procedure described above differs from the most recent sampling design periods: in the 1997-2005 and 2006-2015 design periods, PSUs were grouped into strata using social and demographic characteristics of the area. Depending on the year, one or more PSUs were sampled per stratum, with the probability of selection for each PSU being proportional to its population size (PPS) within strata. In the second stage of sampling, a selection of geographic area segments is sampled from within each PSU. These segments were then subdivided into clusters, each of which contained a small number (approximately 4-9) housing units.

Along with sample redesign, the NHIS underwent major survey redesigns in 1997 and 2019. The 1997 redesign introduced the random selection of a single adult and a single child to answer a battery of additional questions. While sampling one adult was a common practice in earlier supplements, earlier waves of the core questionnaire collected full information on all household members. After 1997, interviewers continued to collect information on the household, socio-demographic characteristics, and basic indicators of health status, disability, and utilization of health care services for all persons. However, to reduce interview length and biases from proxy reporting, a new sampling scheme for the Basic Module questionnaire was adopted, in which more extensive information was collected on one randomly selected sample adult and one sample child from each family. These questions appear in the sample adult and sample child files in the original NHIS public use files.   In IPUMS NHIS, for variables based on questions asked of sample adults and/or sample children, the universe statement in the variable description refers to "sample adults" and/or "sample children" rather than "persons."

The 2019 redesign eliminated data collection on all household members, limiting NHIS data collection to one randomly selected sample adult and one sample child from each household. Information about a limited set of family, spousal or partner characteristics (for sample adults), and parental characteristics (for sample children) was made available on the sample adult and sample child files, although individual-level information about other family members is no longer available after 2018.

The NCHS continued to field the NHIS throughout 2020 with several notable changes to data collection to accommodate COVID-related nonresponse. The NCHS fielded four separate NHIS study designs during 2020: 1) Normal, in-person data collection in quarter 1; 2) Telephone-only in quarter 2; 3) Telephone-first in quarters 3 and 4; and 4) The introduction in August 2020 of a longitudinal sample while maintaining the partial original 2020 sample. In light of changing response rates in 2020, NCHS introduced a subsample of adults, the "longitudinal" sample, who had previously participated in the NHIS with known representativeness and nearly complete telephone contact information to inform weighting and estimation techniques that were used to produce official 2020 estimates. There are no sample children in the longitudinal sample. NCHS reduced the number of original 2020 sample adults and sample children to accommodate re-interviewing these individuals. As a result, the number of sample children in 2020 is much smaller than in 2019. The "partial" sample refers to the individuals in the original 2020 sample, including sample adults and sample children. For more information, please see the user note on COVID-related NHIS study design changes and COVID-related content.

Oversamples

Data collected during the 1985-2015 period contain oversamples of selected racial and ethnic groups. Beginning with the 1985 sample redesign, the NHIS included an oversample of the black population to increase the reliability of estimates for this group. In 1995, the NHIS also implemented an oversample of Hispanics. From 2006-2015, the NHIS also included an oversample of the Asian population. Due to this oversampling, many more blacks, Hispanics, and Asians were interviewed than would be if the sample were exactly proportional to the U.S. population. Thus, each person from these oversampled groups represented a smaller number of individuals than do other persons in the sample. The use of SAMPLING WEIGHTS, discussed in another user note, corrects for this oversampling to yield representative population estimates. The household-level mechanism to oversample black, Hispanic, and Asian populations was discontinued with the new NHIS sample design implemented in 2016.

Back to Top

Additional Information

For additional information on each of the NHIS redesigns, users can access original NCHS documentation through links provided below.

1969-1974

National Center for Health Statistics. (1975). Health Interview Survey Procedure 1957-1974. Vital Health Stat, 1(11).
http://www.cdc.gov/nchs/data/series/sr_01/sr01_011acc.pdf

1975-1984

National Center for Health Statistics. (1985). The National Health Interview Survey Design, 1973–84, and Procedures, 1975-83. Vital Health Stat, 1(18).
http://www.cdc.gov/nchs/data/series/sr_01/sr01_018acc.pdf

1985-1994

National Center for Health Statistics. (1989). Design and estimation for the National Health Interview Survey, 1985-94. Vital Health Stat, 2(110).
http://www.cdc.gov/nchs/data/series/sr_02/sr02_110.pdf

1995-2005

National Center for Health Statistics. (1999). National Health Interview Survey: Research for the 1995–2004 redesign. Vital Health Stat, 2(126).
http://www.cdc.gov/nchs/data/series/sr_02/sr02_126.pdf

National Center for Health Statistics. (2000). Design and estimation for the National Health Interview Survey, 1995–2004. Vital Health Stat, 2(130).
http://www.cdc.gov/nchs/data/series/sr_02/sr02_130.pdf

2006-2015

National Center for Health Statistics. (2014). Design and Estimation for the National Health Interview Survey, 2006-2015. Vital Health Stat, 2(165).
http://www.cdc.gov/nchs/data/series/sr_02/sr02_165.pdf

2016-present

National Center for Health Statistics. (2017). Survey Description, National Health Interview Survey, 2016. Hyattsville, MD.
ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NHIS/2016/srvydesc.pdf

National Center for Health Statistics. (2020). Survey Description, National Health Interview Survey, 2019. Hyattsville, MD.
https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NHIS/2019/srvydesc.pdf

Back to Top