Outlier Treatment (Method)


In business surveys, the distribution of variables is often highly skewed, resulting in sample observations that differ substantially from the majority of observations in the sample. The literature refers to these units as outliers.

Outliers can be representative (representing other population units similar in value to the observed outliers) or non-representative (unique in the population). Here we will consider only the case of representative outliers, i.e., correct values representing other units in the population. Since representative outliers affect the variability of the standard estimators (such as: Horvitz-Thompson or Generalised regression estimators (GREG)), an appropriate way of handling them is required.

The objective of outlier treatment is to make estimates for the population coherent with the real parameters for the population. This means that outlier treatment should be always a trade-off between variance and bias. For small samples, variance is usually the dominating factor in the MSE. On the other hand, bias dominates when the sample size is large.

The module describes one frequently applied estimation method used to reduce the impact of outlying units: Winsorisation. The general idea of Winsorisation involves modifying the outlying observation so that it has less impact on the estimate of a parameter. The effectiveness of the Winsor estimator in terms of its resistance to unusually large residuals depends on the choice of cut-off values, therefore the methods used to estimate the robust regression parameters and the bias parameters need to estimate cut-off values. The cut-offs are optimal only at the level at which estimates are being conducted. The Winsor estimator is easy to implement, but it performs best under models (used for estimating robust regression parameters) that are only moderately robust. Winsorisation can be applied to a large class of estimators (GREG estimators, model-based regression estimators, ratio estimators) and involves modifying their standard forms. This results in estimates with acceptable bias and a smaller variance than that of standard forms, non-Winsorised estimators. We can observe the bias-variance trade-off at the low level of estimation but aggregated Winsorised estimates have large biases, resulting in less precision compared to standard aggregated estimates.


To read the entire document, please access the pdf file (link under "Related Documents" on the right-hand-side of this page).


Your feedback is appreciated. Please send your remarks, suggestions for improvement, etc. to memobust@cbs.nl.