Multiple imputation for national public-use datasets and its possible application for gestational age in United States Natality files

Paediatr Perinat Epidemiol. 2007 Sep;21 Suppl 2:97-105. doi: 10.1111/j.1365-3016.2007.00866.x.


Multiple imputation (MI) is a technique that can be used for handling missing data in a public-use dataset. With MI, two or more completed versions of the dataset are created, containing possibly different but reasonable replacements for the missing data. Users analyse the completed datasets separately with standard techniques and then combine the results using simple formulae in a way that allows the extra uncertainty due to missing data to be assessed. An advantage of this approach is that the resulting public-use data can be analysed by a variety of users for a variety of purposes, without each user needing to devise a method to deal with the missing data. A recent example for a large public-use dataset is the MI of the family income and personal earnings variables in the National Health Interview Survey. We propose an approach to utilise MI to handle the problems of missing gestational ages and implausible birthweight-gestational age combinations in national vital statistics datasets. This paper describes MI and gives examples of MI for public-use datasets, summarises methods that have been used for identifying implausible gestational age values on birth records, and combines these ideas by setting forth scenarios for identifying and then imputing missing and implausible gestational age values multiple times. Because missing and implausible gestational age values are not missing completely at random, using multiple imputations and, thus, incorporating both the existing relationships among the variables and the uncertainty added from the imputation, may lead to more valid inferences in some analytical studies than simply excluding birth records with inadequate data.

Publication types

  • Evaluation Study

MeSH terms

  • Bias
  • Birth Certificates*
  • Data Collection / standards*
  • Data Interpretation, Statistical
  • Female
  • Gestational Age*
  • Humans
  • Pregnancy
  • United States / epidemiology