User Note - Sampling Weights
Use of Sampling Weights with IPUMS NHIS
The National Health Interview Survey (NHIS) is a complex, multistage probability sample that incorporates stratification, clustering, and oversampling of some subpopulations (e.g., Black, Hispanic, and Asian) in some years. Because of the complex sampling design of the NHIS, users of IPUMS NHIS data must make use of sampling weights to produce representative estimates. While appropriate use of sampling weights will produce correct point estimates, statistical techniques that account for the complex sample design are also necessary to produce correct standard errors and statistical tests, so analysts are advised to review the user note on VARIANCE ESTIMATION as well.
Sampling weights are constructed so that each unit (survey respondent, family, or household) can be inflated or expanded to represent other individuals, families, or households in the United States. There are four general components to the NHIS sampling weights. First, the sampling weight represents the inverse probability of unit selection into the sample. The probability of selection is the cross-product of probabilities at each stage of sampling. (See the user note on SAMPLE DESIGN for further details.) Second, the probability of selection is then adjusted for household non-response. These first two steps determine the household weight. For person-level weights, the third component is referred to as a "first stage ratio adjustment." This is used to correct potential bias due to sample under-coverage, by applying a ratio adjustment to each weight based on a race/MSA-residence classification. The fourth component is a post-stratification adjustment for age, race, and sex using quarterly Census Bureau population control totals.
|First stage ratio adjustment|
|1969-1974||6 color-residence classes|
|1975-1984||12 color-residence classes within region|
|1985-1994||16 race-residence-region classes|
|1995-2005||24 race-ethnicity-residence-region classes|
|2006-2015||32 race-ethnicity-residence-region classes|
|2016-present||No first-stage ratio adjustment|
|Second stage (post-stratification) adjustment|
|1969-1994||60 age/sex/race categories|
|1995-2005||88 age/race/ethnicity/sex categories|
|2006-present||100 age/race/ethnicity/sex categories|
PERWEIGHT is an IPUMS-constructed variable that harmonizes the Final Annual Weight in the original NHIS public use files. This weight should be used for analyses at the person level, for variables in which information was collected on all persons. PERWEIGHT represents the inverse probability of selection into the sample, adjusted for non-response with post-stratification adjustments for age, race/ethnicity, and sex using the Census Bureau's population control totals. For each year, the sum of these weights is equal to that year’s civilian, non-institutionalized U.S. population.
Sample Person Weight
SAMPWEIGHT is an IPUMS-constructed variable that harmonizes the Final Annual Sample Adult and Sample Child Weights in the original NHIS public use files for 1997 forward. SAMPWEIGHT also contains the sampling weight for a subset of the pre-1997 survey supplements that followed a sampling scheme in which sample persons (one randomly selected person per household, often restricted to either persons 18+ or persons < 18), rather than all persons, were selected for certain survey supplements. SAMPWEIGHT represents the inverse probability of selection into a sample adult/child supplement, adjusted for non-response with additional post-stratification adjustments using the Census Bureaus population control totals.
FWEIGHT is an IPUMS-constructed variable that harmonizes the Final Annual Family Weight in the original NHIS public use files. Because no Census control totals for the number of civilian, non-institutionalized families exist, this weight is equal to the final person weight of the family member with the smallest post-stratification adjustment. For analyses using the family as the unit of analysis (e.g., how many families could not afford to eat balanced meals in the past 30 days?), researchers should use the family weight, FWEIGHT.
HHWEIGHT is an IPUMS-constructed variable that harmonizes the Final Annual Household Weight in the original NHIS public use files. For analyses using the household as the unit of analysis (e.g., how many households contained a person who needed help with activities of daily living?), researchers should use the household weight, HHWEIGHT. Beginning in 1997, vacant housing units and households that could not be interviewed due to resident absence or refusal to participate have a value of zero for HHWEIGHT.
MORTWT is an IPUMS-recode variable that represents the NCHS-created sample weights that "account for ineligible status due to insufficient identifying information for linkage" in the original public use NHIS Linked Mortality files. MORTWT should be used when analyzing mortality variables in conjunction with variables originally included in the NHIS person files. To analyze mortality variables in conjunction with variables originally included in the NHIS sample adult files, researchers should instead use MORTWTSA. Linked public use mortality variables are available only for NHIS respondents who were at least 18 years old at the time of the survey.
The SUPPxWT series (i.e., SUPP1WT, SUPP2WT, SUPP3WT) are IPUMS-constructed variables that harmonize the Final Annual Weight in selected supplements of the original NHIS public use files. For analyses using variables that are located in different supplements across the years, researchers should review the variable description for the appropriate sampling weights for each year. In some cases, researchers will need to create a new sampling weight by combining different weights from different years.
The CONDWTX series (i.e., CONDWT1, CONDWT2, CONDWT3, CONDWT4, CONDWT5, and CONDWT6) and PARALWT and DIABWT are IPUMS-constructed variables that harmonize chronic condition prevalence factors for person-level variables constructed from the 1978 to 1996 condition records. To analyze variables constructed from the many-to-one condition records, researchers should review the variable description to determine the appropriate condition weight to use with analyses.
Adjusting Sampling Weights When Pooling Multiple Years of Data
The sampling weights in the IPUMS NHIS represent annual inflation factors. In other words, for each individual, the person weight reflects the number of people that individual survey respondent represents in the total U.S. non-institutionalized population for a given year. Thus, if the analyst chooses to use multiple years of data, the sampling weight needs to be adjusted. For example, imagine that an analyst wants to use data from 1990-1999, pooling 10 years of data. The sampling weights need to be adjusted so that the total sample will represent the U.S. population (on average) for the 10-year period. The simplest adjustment method is to simply divide weight by the number of years of data pooled (i.e., divide PERWEIGHT by 10 in this example). Other, more sophisticated methods of adjustment are available, if the analyst is so inclined. However, it is not clear that these methods perform substantially better.
Combining Sampling Weights When a Variable is Located in Different Files across Years
In some cases, a variable of interest may be located in different original NHIS files with different sampling schemes across the years. For example, the IPUMS NHIS variable PAPEVER indicates whether a women ever had a Pap test. For the years, 1982, 1992 and 2002, the variable comes from three different files: 1982 Preventive Care supplement, 1992 Cancer Control supplement, and 2002 Sample Adult section. Accordingly, the sampling weights for each individual variable are PERWEIGHT, SUPP2WT, and SAMPWEIGHT, respectively. For analysis, these weights will need to be combined in a new variable. Researchers should generate a new weight, perhaps called PAPWEIGHT, such that PAPWEIGHT = PERWEIGHT if year = 1982; PAPWEIGHT = SUPP2WT if year = 1992; and PAPWEIGHT = SAMPWEIGHT if year = 2002.
For additional information on the construction of weights within each of the NHIS redesigns, users can access original NCHS documentation through links provided below.
National Center for Health Statistics. (1975). Health Interview Survey Procedure 1957-1974. Vital Health Stat, 1(11).
National Center for Health Statistics. (1985). The National Health Interview Survey Design, 1973-84, and Procedures, 1975-83. Vital Health Stat, 1(18).
National Center for Health Statistics. (1989). Design and Estimation for the National Health Interview Survey, 1985-94. Vital Health Stat, 2(110).
National Center for Health Statistics. (2000). Design and Estimation for the National Health Interview Survey, 1995-2004. Vital Health Stat, 2(130).
National Center for Health Statistics. (2010). National Health Interview Survey (1986-2004) Linked Mortality Files. Analytic Guidelines
Updated Mortality, 1986-2009
National Center for Health Statistics. Office of Analysis and Epidemiology. Analytic Guidelines for NCHS 2011 Linked Mortality Files, August, 2013. Hyattsville, Maryland.
National Center for Health Statistics. (2014). Design and Estimation for the National Health Interview Survey, 2006-2015. Vital Health Stat, 2(165).
National Center for Health Statistics. (2017). Survey Description, National Health Interview Survey, 2016. Hyattsville, MD.