Data Cart

Your data extract

0 variables
0 samples
View Cart
Body Mass Index, calculated from publicly released height and weight variables


BMICALC is a 4-digit numeric variable with one implied decimal place. That is, a value of 0123 should be interpreted as 12.3 The command files delivered with IPUMS extracts automatically divide BMICALC by 10, so no further adjustment is needed.

0.0 = NIU
996.0 = Not calculable


BMICALC reports the Body Mass Index, a measure of body fat based on height and weight, as calculated by IPUMS NHIS from the public use file data on height and weight. BMICALC was calculated using the following formula: [Weight in pounds/(Height in inches, squared)] multiplied by 703 and rounded to one digit past the implied decimal point. Individuals not asked their weight and height are coded "000" (for "Not in universe") for BMICALC. Individuals whose weight or height was topcoded, bottomcoded, or, for 1997 forward, an outlying value that was suppressed for confidentiality reasons, receive a code of "996" for BMICALC.

The website of the Centers for Disease Control and Prevention on BMI reports the following "standard weight status categories associated with BMI ranges for adults." BMI below 18.5 is associated with "underweight" weight status; BMI 18.5 to 24.9 is associated with "normal" weight status; BMI 25.0 to 29.9 is associated with "overweight" weight status; BMI 30.0 and above is associated with "obese" weight status.


The CDC website also notes, "the correlation between the BMI number and body fatness is fairly strong; however, the correlation varies by sex, race, and age." For example, women tend to have more body fat than men at the same BMI; older people, on average, tend to have more body fat than young adults; and highly trained athletes may have a high BMI because of increased muscularity rather than increased body fatness. The CDC adds, "It is important to remember that BMI is only one factor related to risk for disease," and that other predictors, such as waist circumference and levels of blood pressure and physical activity, are also important.


Changes in the top and bottom codes for the height and weight variables, changes in the universe, changes in the questions used to collect data on height and weight, and the 2019 questionnaire redesign may affect comparability. The formula used to calculate BMICALC is consistent over time. The questions used to collect the input data on height and weight are consistent with a few exceptions. For height, interviewers asked, "How tall are you without shoes?" In 1974 only, the question does not specify "without shoes." For weight, interviewers asked, "How much do you weigh without shoes?" In 1974 only, the question does not specify "without shoes" and in 1976 and 1977, the question specifies "without clothes or shoes." However, the universe of persons for whom these data were collected varies over time. Also, for the period prior to 1997, proxy reporting by family members of values for height and weight were allowed, while sample adults themselves answered these questions for 1997 forward (except in rare cases where disability precluded self-reporting). Self-reports are generally considered more accurate than proxy reports for such questions.

In addition, the range of calculated values for BMICALC differs across years, in conjunction with differences in topcodes and bottom codes for height and weight in the public use data.


Bottomcoded values for height were 36 inches for 1976-1995, 58 inches for 1996, and 59 inches for 1997 forward. Topcoded values for height were 84 inches for 1976-1981, 98 inches for 1982-1995, 77 inches for 1996, and 76 inches for 1997 forward. Weight bottomcodes were 50 pounds for 1976-1995, 97 pounds for 1996, and 99 pounds for 1997-2005, and 100 pounds for 2006. Weight topcodes were 300 pounds in 1976 and 1978, 400 pounds for 1977 and 1979-1981, 501 pounds for 1982-1995, 290 pounds for 1996, 285 pounds for 1997-2005, and 299 pounds for 2006 forward.

The meaning of "topcode" and "bottomcode" also differs for the periods before and after 1997. Prior to 1997, the topcoded categories include all persons who reported that specific value for height/weight as well as all persons who reported a higher value. Similarly, prior to 1997, the bottomcoded categories include all persons who reported that specific value for height/weight as well as all persons who reported a lower value. By contrast, for 1997 forward, the topcoded and bottomcoded categories include only persons who reported that specific value, and all persons reporting, respectively, higher or lower values were grouped together in a single category, with values suppressed to protect the confidentiality of respondents.

Given these variations over time in the range of values for BMICALC, researchers using this variable with an extended time series may wish to recode the values into a smaller number of categories (using, for example, the cutoffs for "underweight," "normal weight," "overweight," and "obese" reported above).

For 1997 forward, the variable BMI provides an alternative measure of the body mass index. The meaning of BMICALC and BMI is basically the same, but the values calculated for an individual may differ somewhat between the two variables. BMI was calculated by the National Center for Health Statistics (NCHS) "using the inhouse version of the height and weight variables, which contain the greater range of height and weight values than are available on the public use file" (according to the Codebooks for 1997 forward).


The formula the NCHS used to calculate BMI was also different: BMI = [Weight(kg)/[Height(m) squared]], rounded to 2 decimal places (using conversion factors of 1 kg = 2.205 pounds and 1 meter (m) = 39.37 inches).

Researchers interested only in data for 1997 forward should use BMI rather than BMICALC. Researchers interested in pre-1997 data must use BMICALC or must calculate the Body Mass Index themselves from the WEIGHT and HEIGHT variables. Researchers interested in data from both before and after 1997 may wish to use BMICALC to maximize comparability over time.

Questionnaire design changes introduced in 2019 limit comparability with earlier years. The NHIS questionnaire was substantially redesigned in 2019 to introduce a different data collection structure and new content. For more information on changes in terminology, universes, and data collection methods beginning in 2019, please see the user note.


  • 1976: Persons age 18+.
  • 1977: Subsampled persons age 20+.
  • 1978-1981: Persons age 17+.
  • 1982-1996: Persons age 18+.
  • 1997-2018: Sample adults age 18+.
  • 2019: Sample adults age 18+.
  • 2020 2022: Sample children ages 10-17 and sample adults age 18+.
  • 2021: Sample adults age 18+.


  • 1976-2022