Imputation - Main Module (Theme)


A practical problem that nearly always occurs in statistical research is that the collected data suffer from missing values. This problem occurs both for data collected in traditional surveys and for administrative data. It is usually difficult (but not impossible) to use an incomplete data set directly for inference of population parameters, such as totals or means of target variables. For this reason, statisticians often create a complete data set prior to the estimation stage, by replacing the missing values with estimated values from the available data. This process is referred to as imputation.

To impute the missing values in a data set, several methods are available. Possible imputation methods include: deductive imputation, model-based imputation (including mean, ratio, and regression imputation), and donor imputation (including cold deck, random hot deck, and nearest-neighbour imputation as well as predictive mean matching). Different methods may be useful in different contexts. This module mentions some general aspects of imputation that are not related to a particular method, such as the inclusion or exclusion of a disturbance term in the imputed values, the use of deterministic versus stochastic imputation, and the incorporation of design weights into imputation methods. We also briefly discuss multiple imputation and mass imputation.


To read the entire document, please access the pdf file (link under "Related Documents" on the right-hand-side of this page).


Your feedback is appreciated. Please send your remarks, suggestions for improvement, etc. to memobust@cbs.nl.