User Note - Sampling Weights

Use of Sampling Weights with IPUMS NHIS

The National Health Interview Survey (NHIS) is a complex, multistage probability sample that incorporates stratification, clustering, and oversampling of some subpopulations (e.g., Black, Hispanic, and Asian) in some years. Because of the complex sampling design of the NHIS, users of IPUMS NHIS data must make use of sampling weights to produce representative estimates. While appropriate use of sampling weights will produce correct point estimates, statistical techniques that account for the complex sample design are also necessary to produce correct standard errors and statistical tests, so analysts are advised to review the user note on VARIANCE ESTIMATION as well.

2019-forward;
Before 2019;
Weight variables;
Adjusting Sampling Weights When Pooling Multiple Years of Data;
Combining Sampling Weights When a Variable is Located in Different Files across Years;
Additional Information.

2019-forward

The approach for creating NHIS sampling weights changed significantly in 2019; large declines in response rates, from approximately 90% to 60% or less, required the introduction of more sophisticated adjustment techniques to better correct for nonresponse. In addition, the 2024 NHIS has an oversample (approximately a 25 percent increase in sampled households) of non-metropolitan areas.

The sampling scheme still provides the basis for deriving "base weights" for households. These base weights are not shared publicly and reflect households' probability of selection, such that their sum approximates the total number of households in the United States.

These household base weights are then adjusted for nonresponse for Sample Adult and Sample Child respondents. In 2019 and 2020, nonresponse adjustments parameters were derived from multilevel models predicting response using selected predictors from survey paradata as well as the Census Planning Database and Area Health Resources Files. Starting in 2021, nonresponse adjustments were derived from classification tree algorithms such as the Recursive Partitioning for Modeling Survey Data (rpms) algorithm from the R package RPMS (used in 2021 and 2022) and the Conditional Inference Tree (ctree) algorithm from the R package PARTYKIT (used starting in 2023). When applying these algorithms, base weights are adjusted using the inverse of the response rate within terminal nodes of resulting classification trees.

Following nonresponse adjustments, weights are then calibrated using iterative proportional raking along a set of selected dimensions. This set of dimensions has slightly varied over time but revolves around key demographic and geographic parameters including age by sex and age by race and ethnicity (obtained from the U.S. Census Bureau's population projections), as well as educational attainment and Census division by Metropolitan Statistical Area (MSA) status (mostly obtained from the American Community Survey (ACS) one-year estimates). Raking procedures entail that weights are being calibrated within dimensions one at a time until marginals approximately match population control totals.

Overall, NHIS documentation specifies that "[t]hese changes to the nonresponse adjustment approach and the calibration methods have the potential to impact comparisons of the weighted survey estimates over time."

Before 2019

Before 2019, sampling weights were constructed so that each unit (survey respondent, family, or household) could be inflated or expanded to represent other individuals, families, or households in the United States. There were four general components to the NHIS sampling weights. First, the sampling weight represents the inverse probability of unit selection into the sample. The probability of selection is the cross-product of probabilities at each stage of sampling. (See the user note on SAMPLE DESIGN for further details.) Second, the probability of selection is then adjusted for household nonresponse. These first two steps determine the household weight. For person-level weights, the third component is referred to as a "first stage ratio adjustment." This is used to correct potential bias due to sample under-coverage, by applying a ratio adjustment to each weight based on a race/MSA-residence classification. The fourth component is a post-stratification adjustment for age, race, and sex using quarterly Census Bureau population control totals.

Years	First stage ratio adjustment
1969-1974	6 color-residence classes
1975-1984	12 color-residence classes within region
1985-1994	16 race-residence-region classes
1995-2005	24 race-ethnicity-residence-region classes
2006-2015	32 race-ethnicity-residence-region classes
2016-2018	No first-stage ratio adjustment

Years	Second stage (post-stratification) adjustment
1969-1994	60 age/sex/race categories
1995-2005	88 age/race/ethnicity/sex categories
2006-2018	100 age/race/ethnicity/sex categories

WEIGHT VARIABLES

Sample Person Weight

SAMPWEIGHT is an IPUMS-constructed variable that harmonizes the Final Annual Sample Adult and Sample Child Weights in the original NHIS public use files for 1997 forward. SAMPWEIGHT also contains the sampling weight for a subset of the pre-1997 survey supplements that followed a sampling scheme in which sample persons (one randomly selected person per household, often restricted to either persons 18+ or persons < 18), rather than all persons, were selected for certain survey supplements. SAMPWEIGHT represents the inverse probability of selection into a sample adult/child supplement, adjusted for nonresponse with additional post-stratification (1997-2018) or raking (2019-forward) adjustments using the Census Bureau's population control totals.

Person Weight

PERWEIGHT is an IPUMS-constructed variable that harmonizes the Final Annual Weight in the original NHIS public use files for 2018 and earlier samples. This weight should be used for analyses at the person level, for variables in which information was collected on all persons. PERWEIGHT represents the inverse probability of selection into the sample, adjusted for nonresponse with post-stratification adjustments for age, race/ethnicity, and sex using the Census Bureau's population control totals. For each year, the sum of these weights is equal to that year’s civilian, non-institutionalized U.S. population.

Family Weight

FWEIGHT is an IPUMS-constructed variable that harmonizes the Final Annual Family Weight in the original NHIS public use files for 2018 and earlier samples. Because no Census control totals for the number of civilian, non-institutionalized families exist, this weight is equal to the final person weight of the family member with the smallest post-stratification adjustment. For analyses using the family as the unit of analysis (e.g., how many families could not afford to eat balanced meals in the past 30 days?), researchers should use the family weight, FWEIGHT.

Household Weight

HHWEIGHT is an IPUMS-constructed variable that harmonizes the Final Annual Household Weight in the original NHIS public use files for 2018 and earlier samples. For analyses using the household as the unit of analysis (e.g., how many households contained a person who needed help with activities of daily living?), researchers should use the household weight, HHWEIGHT. Beginning in 1997, vacant housing units and households that could not be interviewed due to resident absence or refusal to participate have a value of zero for HHWEIGHT.

Longitudinal Sample Weight

LONGWEIGHT applies to the persons included in the 2020 longitudinal sample. The longitudinal sample includes those sample adults who previously responded to the 2019 NHIS and were re-contacted to complete the 2020 NHIS. According to the 2020 Survey Description, LONGWEIGHT should be used to evaluate individual-level changes among the same adults before and during the COVID-19 pandemic. Please see the user note on COVID-related changes to the NHIS for more information on the longitudinal sample.

Partial Sample Weight

PARTWEIGHT applies to the persons included in the 2020 partial sample, the group of sample adults and sample children included in the original 2020 sample. The 2020 Survey Description advises that PARTWEIGHT should be used with the 2020 data when pooling 2019 and 2020 data to increase sample size; SAMPWEIGHT should be used for the 2019 data. Otherwise, SAMPWEIGHT should be used to produce official estimates for 2020 and to compare estimates between 2019 and 2020. The partial sample does not include sample adults from the longitudinal sample.

Mortality Weights

MORTWT is an IPUMS-recode variable that represents the NCHS-created sample weights that "account for ineligible status due to insufficient identifying information for linkage" in the original public use NHIS Linked Mortality files. MORTWT should be used when analyzing mortality variables in conjunction with variables originally included in the NHIS person files. To analyze mortality variables in conjunction with variables originally included in the NHIS sample adult files, researchers should instead use MORTWTSA. Linked public use mortality variables are available only for NHIS respondents who were at least 18 years old at the time of the survey.

Supplemental Weights

The SUPPXWT series (i.e., SUPP1WT, SUPP2WT, SUPP3WT) are IPUMS-constructed variables that harmonize the Final Annual Weight in selected supplements of the original NHIS public use files. For analyses using variables that are located in different supplements across the years, researchers should review the variable description for the appropriate sampling weights for each year. In some cases, researchers will need to create a new sampling weight by combining different weights from different years.

Condition Weights

The CONDWTX series (i.e., CONDWT1, CONDWT2, CONDWT3, CONDWT4, CONDWT5, and CONDWT6) and PARALWT and DIABWT are IPUMS-constructed variables that harmonize chronic condition prevalence factors for person-level variables constructed from the 1978 to 1996 condition records. To analyze variables constructed from the many-to-one condition records, researchers should review the variable description to determine the appropriate condition weight to use with analyses.

Adjusting Sampling Weights When Pooling Multiple Years of Data

The sampling weights in the IPUMS NHIS represent annual inflation factors. In other words, for each individual, the person weight reflects the number of people that individual survey respondent represents in the total U.S. non-institutionalized population for a given year. Thus, if the analyst chooses to use multiple years of data, the sampling weight needs to be adjusted. For example, imagine that an analyst wants to use data from 1990-1999, pooling 10 years of data. The sampling weights need to be adjusted so that the total sample will represent the U.S. population (on average) for the 10-year period. The simplest adjustment method is to simply divide weight by the number of years of data pooled (i.e., divide PERWEIGHT by 10 in this example). Other, more sophisticated methods of adjustment are available, if the analyst is so inclined. However, it is not clear that these methods perform substantially better.

TAKE NOTE: Special Considerations when Pooling Data

Change to Sampling Weight Methodology Implemented in 2019. The process of generating sampling weights changed sharply from the approach employed in 2018 and earlier years. Because of this marked change, which was accompanied by a major redesign of the NHIS questionnaire and data collection approach, it is not possible to know whether any changes detected between 2019 and earlier years are due to changes in the sampling weights, the questionnaire or data collection redesign, or reflect actual change in the phenomena under study. Results of a test conducted in 2018-19 by NCHS indicate that differences in prevalence estimates between pre-2019 and 2019 forward years of data are likely influenced by the 2019 redesign. Based on the results of the Bridge Test, IPUMS NHIS recommends that users do not compare the trends in the pre-2019 with the trends in the 2019-forward data. NCHS has signaled that they plan to release additional evaluation results as more 2019-forward data become available. We will update our guidance based on the findings of any such evaluations.
Extra adjustments needed when pooling 2019 and 2020 samples. To improve adjustment of the 2020 sampling weights for nonresponse, NCHS re-contacted selected 2019 NHIS sample adults to complete the 2020 NHIS interview between August and December of 2020. This longitudinal sample, also known as the 2020 followback sample, is comprised of 10,415 sample adults and can be analyzed as a one-time longitudinal panel with observations, spaced one year apart, that take place before and during the COVID-19 pandemic. Because both the 2019 and the 2020 samples contain these 10,415 sample adults, however, special measures must be taken when combining the 2019 and 2020 samples for analyses where users wish to pool 2019 and 2020 together to increase sample size. NCHS advises adjusting the sample and the sampling weight when combining the 2019 and 2020 samples. First, drop any sample adult records with zero values on the partial sample weight for 2020 (PARTWEIGHT). This will retain only the 2019 observations of longitudinal sample members in the pooled sample. Second, use PARTWEIGHT rather than SAMPWEIGHT for sample adults in the 2020 sample. Note that sample children were not included in the 2019-2020 longitudinal sample and the adjustment described above does not need to be made for pooled analyses of sample children in the 2019-2020 samples. We provide sample code (in Stata) to make this adjustment for an IPUMS NHIS extract containing both the 2019 and 2020 samples:
```
drop if age > 17 & partweight == 0 & year == 2020
gen pooled_weight = .
replace pooled_weight = sampweight if year == 2019 | (year == 2020 & age < 18)
replace pooled_weight = partweight if year == 2020 & age > 17 & age != .
```
Make any other adjustments to pooled_weight for analyses of pooled data as described in previous sections. For more information about COVID-19 impacts on NHIS data collection, please see our user note.
Sampling weight adjustments needed when analyzing COVID-19 data. Because much of the COVID-related content available in the 2020-2021 samples is not available for all calendar quarters of data collection in those years, analysts must adjust the annual sampling weights to account for partial year coverage to correctly produce estimates based on the 2020-2021 COVID data. For example, the NHIS collected data about COVID-19 beginning in calendar quarter 3 of 2020, adult COVID-19 vaccination information beginning in calendar quarter 2 of 2021, and COVID-19 vaccination for children aged 12-17 beginning in calendar quarter 3 of 2021. Depending on analytical goals, NCHS outlines several different approaches for correctly producing estimates based on the 2020-2021 COVID-19 information (based on guidance found in the 2020 and 2021 NHIS survey descriptions). Below, we outline the approach for correctly producing population estimates based on partial year data. For other use cases, such as pooling together all available calendar quarters where COVID-19 information was collected or to produce semi-annual trend estimates, please refer to the section on "Analyzing 2021 NHIS" in the 2021 NHIS survey description (starting on p. 41).

To produce correct population estimates based on measures collected for only part of the year, analysts will need to adjust the annual sampling weights included in the 2020 and 2021 data. All of the COVID-19 content available for the 2020 NHIS was collected only in calendar quarters 3 and 4, and some of the COVID-19 content available for the 2021 NHIS, such as COVID-19 vaccination for adults and COVID-19 vaccination for children ages 12-17, was collected only in some calendar quarters of 2021. To adjust the sampling weights, analysts should first create an interim sampling weight that sets the value of the annual sampling weight (SAMPWEIGHT) to zero for all calendar quarters where the COVID-19 data were not collected by NHIS. They should then multiply the interim sampling weight by the number of calendar quarters in the year by the number of calendar quarters the data were collected to inflate the weight to cover a full calendar year. For example, the COVID-19 vaccination measure for sample adults was collected for calendar quarters 2, 3, and 4 in 2021, so the multiplier would be 4/3; similarly, the COVID-19 vaccination measure for children ages 12-17 was collected for calendar quarters 3 and 4, so the multiplier would be 4/2, doubling the interim weight. To illustrate for adults in 2021 (in Stata):

gen covid_interimwt = sampweight
replace covid_interimwt = 0 if intervwqtr == 1
gen covidwt = covid_interimwt*4/3

Combining Sampling Weights When a Variable is Located in Different Files across Years

In some cases, a variable of interest may be located in different original NHIS files with different sampling schemes across the years. For example, the IPUMS NHIS variable PAPEVER indicates whether a women ever had a Pap test. For the years, 1982, 1992 and 2002, the variable comes from three different files: 1982 Preventive Care supplement, 1992 Cancer Control supplement, and 2002 Sample Adult section. Accordingly, the sampling weights for each individual variable are PERWEIGHT, SUPP2WT, and SAMPWEIGHT, respectively. For analysis, these weights will need to be combined in a new variable. Researchers should generate a new weight, perhaps called PAPWEIGHT, such that PAPWEIGHT = PERWEIGHT if year = 1982; PAPWEIGHT = SUPP2WT if year = 1992; and PAPWEIGHT = SAMPWEIGHT if year = 2002.

Additional Information

For additional information on the construction of weights within each of the NHIS redesigns, users can access original NCHS documentation through links provided below.

1969-1974

National Center for Health Statistics. (1975). Health Interview Survey Procedure 1957-1974. Vital Health Stat, 1(11).
http://www.cdc.gov/nchs/data/series/sr_01/sr01_011acc.pdf

1975-1984

National Center for Health Statistics. (1985). The National Health Interview Survey Design, 1973-84, and Procedures, 1975-83. Vital Health Stat, 1(18).
http://www.cdc.gov/nchs/data/series/sr_01/sr01_018acc.pdf

1985-1994

National Center for Health Statistics. (1989). Design and Estimation for the National Health Interview Survey, 1985-94. Vital Health Stat, 2(110).
http://www.cdc.gov/nchs/data/series/sr_02/sr02_110.pdf

1995-2005

National Center for Health Statistics. (2000). Design and Estimation for the National Health Interview Survey, 1995-2004. Vital Health Stat, 2(130).
http://www.cdc.gov/nchs/data/series/sr_02/sr02_130.pdf

Mortality 1986-2004

National Center for Health Statistics. (2010). National Health Interview Survey (1986-2004) Linked Mortality Files. Analytic Guidelines
http://www.cdc.gov/nchs/data/datalinkage/nhis_mort_analytic_guidelines.pdf

Updated Mortality, 1986-2009

National Center for Health Statistics. Office of Analysis and Epidemiology. Analytic Guidelines for NCHS 2011 Linked Mortality Files, August, 2013. Hyattsville, Maryland.
http://www.cdc.gov/nchs/data/datalinkage/2011_linked_mortality_analytic_guidelines.pdf

2006-2015

National Center for Health Statistics. (2014). Design and Estimation for the National Health Interview Survey, 2006-2015. Vital Health Stat, 2(165).
http://www.cdc.gov/nchs/data/series/sr_02/sr02_165.pdf

2016-2018

National Center for Health Statistics. (2017). Survey Description, National Health Interview Survey, 2016. Hyattsville, MD.
ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NHIS/2016/srvydesc.pdf

2019-present

Bramlett MD, Dahlhammer JM, Bose J, and Blumberg SJ. New procedures for nonresponse adjustments to the 2019 National Health Interview Survey sampling weights. Published September, 2020.

National Center for Health Statistics. Survey Description, National Health Interview Survey, 2019. Hyattsville, MD. 2020.
https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NHIS/2019/srvydesc-508.pdf

National Center for Health Statistics. Survey Description, National Health Interview Survey, 2020. Hyattsville, MD. 2021.
https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NHIS/2020/srvydesc-508.pdf

National Center for Health Statistics. Survey Description, National Health Interview Survey, 2021. Hyattsville, MD. 2022.
https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NHIS/2021/srvydesc-508.pdf

National Center for Health Statistics. Survey Description, National Health Interview Survey, 2022. Hyattsville, MD. 2023.
https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NHIS/2022/srvydesc-508.pdf

National Center for Health Statistics. Survey Description, National Health Interview Survey, 2023. Hyattsville, MD. 2024.
https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NHIS/2023/srvydesc-508.pdf

National Center for Health Statistics. Survey Description, National Health Interview Survey, 2024. Hyattsville, MD. 2025.
https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NHIS/2024/srvydesc-508.pdf