Statistical Matching Methods (Method)


Statistical matching (SM) methods for microdata aim at integrating two or more data sources related to the same target population in order to derive a unique synthetic data set in which all the variables (coming from the different sources) are jointly available. The synthetic data set is the basis of further statistical analysis, e.g., microsimulations. The word synthetic refers to the fact that the records are obtained by integrating the available data sets rather than direct observation of all the variables. Usually the matching is based on the information (variables) common to the available data sources and, when available, on some auxiliary information (a data source containing all the interesting variables or an estimate of a correlation matrix, contingency table, etc.). When the additional information is not available and the matching is performed on the variables shared by the starting data sources, then the results will rely on the assumption of independence among variables not jointly observed given the shared ones.

The synthetic data set can be derived by applying a parametric or a nonparametric approach. They can be mixed too.


To read the entire document, please access the pdf file (link under "Related Documents" on the right-hand-side of this page).


Your feedback is appreciated. Please send your remarks, suggestions for improvement, etc. to memobust@cbs.nl.