The users of the scientific use files (SUF) are welcomed to post any questions and observations they have on the EU-SILC datasets/ variables.
1. How to use the data?
(For example: What is the relationship between the R and P files? How these files can be linked?)
2. How are the data coded?
(For example: What does a particular variable value mean?)
3. Why are the data is anonymised and what are the contents of the anonymised files?
(For example: Does the EU-SILC scientific use files contain data on NUTS3 level?)
To post your question, you need to be logged in to CROS (*). Once you're logged in, go to the "add new comment" at the bottom of your page, enter your comment, and click on the "Post comment" button.
To post follow-up questions (or replies), please click on the "reply" button for the comment in question.
Tip: check out the answers to the frequently asked questions on the EU-SILC scientific use files to see if your question has already been answered! |
(*) In case you don't have a CROS account, you can sign up for an account for free. If you have trouble in doing so, please refer to
https://ec.europa.eu/eurostat/cros/content/eu-login-guideline-account-creation
to learn how to sign up.
Comments
Why does the income in Germany fluctuate unexpectedly 2005-2007?
When downloading data from income distribution by quintile (EU-SILC survey [ilc_di01]), I noticed that the median income in Germany dropped by 4.5% between 2005 and 2006, then jumped by 13% between 2006 and 2007...
There is no notice or flag explaining any methodology break or other reason for this (according to me) unrealistic increase...
Is there any explanation to this?
Reply from DESTATIS on the income 2005-2007
Original reply from DESTATIS
Bei der Betrachtung der EU-SILC Zeitreihen 2005-2007 für Deutschland ist zu beachten, dass sich hierbei um die ersten Jahre einer Panelerhebung handelt und dass die Art des Auswahlverfahrens in diesen Jahren verschieden war. In Deutschland kam bei der Stichprobenauswahl im Rahmen einer Ausnahmeregelung in diesen Erhebungsjahren 2007 eine Kombination aus Quotenstichprobe und Zufallsstichprobe privater Haushalte zur Anwendung. Beim Start der Erhebung 2005 in Deutschland wurden drei von vier Rotationsvierteln aus einer Quotenstichprobe gezogen. Jährlich wurde eines dieser Viertel mit einem zufällig gezogenen Viertel ersetzt, so dass im Jahr 2008 erstmals die gesamte EU-SILC-Stichprobe auf einer Zufallsauswahl beruhte. Erst durch die Umstellung von einer gemischten Quoten/-Zufallstichprobe auf eine reine Zufallstichprobe ab dem Erhebungsjahr 2008 hat sich die Ergebnisqualität von EU-SILC erhöht und die Stabilisierungstendenzen im Zeitverlauf gezeigt.
Translation of the reply from DESTATIS
When looking into the EU-SILC time series 2005 – 2007 for Germany it has to be taken into account that these are the first years of a panel survey and that the type of the selection process was different in these years. In Germany the sample selection 2007 was applied under an exception for these survey years as a combination of a quota sample and a random sample of private households. At the start of the survey in 2005 three out of four rotation quarters were drawn from a quota sample. Every year one of these quarters was replaced by a randomly drawn quarter, so that in the year 2008 for the first time the total EU-SILC sample was based on random selection. Only by the change from a mixed quota/random sample to a pure random sample as from the survey year 2008 the quality of results of EU-SILC increased and showed stabilisation tendencies over time.
How to combine the four longitudinal individual-level files?
I am currently trying to combine the four files in individual level data in the longitudinal version.
I expected that each observation would be identified by year, household, country, and personal id. But this does not seem to be the case in the longitudinal survey, in particular, the household data (h file) systematically contains less observations than the household register (d file).
I have tried to make sense of this discrepancy looking at the documentation, but failed to find any useful information. Do you have any clue of which variables should link the files?
Explanation on how to merge longitudinal individual-level files
The register files contain more persons/households than the data files, because the data files contain persons/households where the interview was successful with the selected person/household.
It is described in more detail in Doc65:
Files to transmit to Eurostat
The target variables will be sent to EUROSTAT in four different files:
The household register file (D) must contain every selected household, including those where the address could not be contacted or those households could not be interviewed.
In the other files, records related to a household will only exist if the household has been contacted AND has a completed household interview in the household data file (H) and at least one member has complete data in the personal data file (P). This member must be the selected respondent if this mode of selection is used.
The personal register file (R) must contain a record for every person currently living in the household or temporarily absent. In the longitudinal component it must also contain a record for every person registered in the R-file of the previous year or who has lived in the household for at least three months during the income reference method.
The personal data file (P) must contain a record for every eligible person for whom the information could be completed from interview and/or registers.
If you conduct an outer join of the data and register files for the longitudinal version based on
then there will be some missing observations for variables on the data side (P, H), but no missing observations for any of the variables on the register side (R, D), since there will be records coming from the register side without a matching record on the data side.