WP3 - Framework for the quality evaluation of statistical output based on multiple sources

[no-lexicon]Work Package 3: Framework for the quality evaluation of statistical output based on multiple sources

"The aim of Work Package 3 (WP 3) is to produce measures for the quality of the output of multisource statistics. These quality measures primarily focus on the quality dimensions “accuracy” and “coherence” (principle 12 and 14 in the European Statistics Code of Practice). Within WP 3 we have carried out a critical literature review of existing and currently proposed quality measures. We have also carried out suitability tests. These suitability tests do not directly refer to the suitability of the quality measures (accuracy, coherence) themselves but rather to the methods (or recipes) to estimate them in a given situation. If no methods /recipes exist to estimate a quality measure for a given situation, apparently the quality measure cannot be applied (yet) for that situation.

Many different situations can arise when multiple sources are used to produce statistical output, depending on both the nature of the data sources used and the kind of output produced. In order to structure the work within WP3 we have proposed a breakdown into a number of basic data configurations that seem most commonly encountered in practice. In practice, a given situation may well involve several basic configurations at the same time. The aim of the basic data configuration is not to classify all possible situations that can occur, but to provide a useful focus and direction for the work to be carried out. Note that the basic data configurations give a simplified view of reality. Nevertheless, many practical situations can be built on these basic configurations, and basic data configurations are a good way to structure the work in our opinion."

Basic data configurations:

  • Configuration 1: multiple cross-sectional data that together provide complete dataset with full coverage of target population
  • Configuration 2: same as Configuration 1, but with overlap between different data sources
  • Configuration 2S: Special case of Configuration 2: one of the data sources consists of sample data
  • Configuration 3: extension of Configuration 2: we now also have under-coverage of the target population
  • Configuration 4: aggregated data are available besides micro data
  • Configuration 5: only aggregated data overlap with each other and need to be reconciled (complete macro-data counterpart of Configuration 2)
  • Configuration 6: longitudinal data are considered

Download Discussion paper (Estimating Classification Error under Edit Restrictions in Combined Survey-Register Data by Laura Boeschoten, Daniel Oberski and Ton de Waal).

Download WP3 Final report.[/no-lexicon]