Summary
This section explores the problem of data integration in the following context: there are two nonoverlapping surveys (in the sense that the two sets of units collected in the two surveys are distinct) that refer to the same target population, the variables of interest for the statistical analyses are available distinctly in the two surveys, due to the nature of the data sets it is not possible to create joint information on these variables by means of their common identifiers. This problem is usually referred to as statistical matching. As a matter of fact, this is a non-standard problem in statistics, for which naïve methods based on data imputation were defined at the beginning. Nowadays the complex nature of statistical matching is dealt differently, by the exploration of all the possible models that could give as a result the two sample surveys at hand, giving rise to “sets” of estimates instead of the more usual “point estimates”. These sets of estimates should not be confused with confidence intervals: they just reflect the fact that joint information on the target variables is missing.
To read the entire document, please access the pdf file (link under "Related Documents" on the right-hand-side of this page).
Your feedback is appreciated. Please send your remarks, suggestions for improvement, etc. to memobust@cbs.nl.