Codes and Frequencies
INCIMP1 is a variable that includes imputed values to replace missing data for the original variable INCFAM97ON2, a recoded variable of total combined family income (from all sources) in the previous calendar year. The complementary imputation flag variable IMPYFAMFLAG1 indicates whether responses in INCIMP1 were reported or imputed.
Related Variables and Sources of Additional Information
INCIMP1 is the first of five variables that contain imputed values for total family income. It was created as part of a set of variables that provide complete (i.e., without missing values) data on family income.
One of the purposes of NHIS data is to study relationships between income and health and to monitor health and health care for persons at different income levels. However, as the technical documentation on "Multiple Imputation of Family Income and Personal Earnings in the National Health Interview Survey: Methods and Examples" describes, non-response rates are high for questions on total family income in the previous calendar year and personal earnings from employment in the previous calendar year. For more information on the imputation methodology, see EMPSTATIMP1.
Before using the imputed income and earnings variables, researchers are strongly advised to read the NCHS documentation on imputed income, such as 2018 Imputed Family/Personal Earnings Files. This documentation cautions that each of the five datasets must be merged with other data from the survey to form a single completed dataset. For IPUMS NHIS data users, the imputed income files have already been merged with other data from each survey year for 1997 through the current year of data, as part of the process of adding these imputed income files and variables to the IPUMS NHIS database.
The NCHS documentation for the imputed income files directs that analysis of the five versions of each imputed income variable should be done separately, using methods and software that are appropriate for such survey data (for example, SAS-callable SUDAAN or SAS-callable IVEware).
Only then can estimates and standard errors be combined using the combining rules described in the aforementioned document on "Multiple Imputation of Family Income and Personal Earnings in the National Health Interview Survey." The 2018 imputed income file documentation further warns:
Examples of correct data analyses and additional information about the procedures used to create the imputed data are provided in the technical documentation referred to above.
The comparability of the INCIMP1 variable over time (and all the INCIMP variables) is somewhat limited by changes in the recoded categories. From 1997 to 2006, there were 11 family income brackets and the top code was $75,000 and over. From 2007 forward, there were 21 family income brackets, and the top code was $100,000 and over.
To maximize comparability across years despite these changes, the IPUMS NHIS variable INCIMP1 employs composite coding, in which the first digit identifies broad groups that are consistent across years and the second digit provides additional detail present only in some years.
Consider, for example, the grouped income categories covering the range from $25,000 to $34,999. For 1997 to 2006, the original NHIS public use files for INCIMP1 provide a single category for the entire income range $25,000-$34,999 (which has code 10 in the IHIS database). For 2007 forward, in the NHIS public use files, two separate categories cover the ranges $25,000-$29,999 (with IHIS code 11) and $30,000-$34,999 (with IHIS code 12). Under the composite coding system, these income categories share a common first digit of 1, indicating that researchers may wish to combine these categories for total family income to achieve comparability for 1997 forward. Researchers interested only in data for 2007 forward can take advantage of the full detail by distinguishing between IHIS category 11 (for $25,000-$29,999) and IHIS category 12 (for $30,000-$34,999).
Changes to data editing of reported income also occurred.
From 1997 to 2004, some respondents reported family incomes of "$2." However, due to an unexpected and presumably erroneous increase in the number of such responses (2.25% of respondents in 2004 versus 0.15% of respondents in 2002, the most proximate year), an edit which would trigger on very high or very low income amounts was introduced to the survey instrument towards the end of 2005. In 2004 and 2005, all of the "$2" responses to the exact amount of family income were subject to income imputation. However, beginning in 2006, when the edit check for very high and very low income reports was implemented, any "$2" responses to total family income were retained and not subject to income imputation.
Changes in response categories and methods used to probe for family income information limit comparability for the original INCFAM97ON2 variable. By extension, these same changes affect the level of detail in the information used to impute family income.
From 1997 to 2006, respondents were asked an open-ended question about their family income. Those who refused to answer or said "I don't know" were asked two follow-up questions: 1) whether the figure was above or below $20,000; and 2) which of 44 grouped income categories matched their family income.
Due to the low response rates for these follow-up questions, a new series of income follow-up questions, which used an unfolding bracket methodology, was introduced in 2007. The unfolding bracket method asked a series of closed-ended income range questions (e.g., "is it less than $50,000?") if the respondent did not provide an answer to the exact income amount question. The closed-ended income range questions were constructed so that each successive question established a smaller range for the amount of the family's income. This change resulted in a somewhat different level of detail of information on which to impute income.
- 1997-2018: All persons.
- 1997-2018 : PERWEIGHT