enEnglish
CROS
This website is no longer updated as from December 2023.
Please visit the new CROS portal for the latest information.

WP7 Overview1

The aim of this pilot is to investigate how a combination of Big Data sources and existing official statistical data can be used to improve current statistics and create new statistics in statistical domains. The work package focusses on the statistical domains : Population, Tourism/border crossings, Agriculture. The WP team will describe the data collection, data linking, data processing and methodological aspects when combining data in statistical domains. Challenges ahead are: representativity issues, linking to other datasets, metadata, international comparability.

Description of the workpackage

Aim of this workpackage is to find out how a combination of Big Data sources, administrative data and statistical data may enrich statistical output in domains: ‘Population’, ‘Tourism/border crossings’ and ‘Agriculture’. In many cases, one data source will not suffice for producing official statistics. In these cases, one has to combine different data sources. This package has a scientific nature. From the methodological, qualitative and technical point of view it is required to work with professional independence.

Timing and partners

Start date: 1 February 2016

End date: 28 February 2017

WP7 is carried out by representatives of three ESSnet Big Data partners: GUS (Statistics Poland) which is leading WP7, CBS (Statistics Netherlands) and ONS (Statistics UK).

Tasks

Task 1. Data availability/Data inventory

1. Identify big data sources taking into account sustainability and availability in several countries

  • Establishing an inventory of these sources by:
    • Brainstorming - a review of potential sources
    • Preparation of a questionnaire with questions about the sources used by the project participants
    • Sending the questionnaire to participants
    • Gathering answers and preparation for analysis
  • Assessment of the possibility of using sources for Big Data analysis in the domains of population, tourism/border crossings, agriculture
  • Build the list of potential sources

2. Identify which results or new products from the source-oriented pilots may contribute to these domains

  • Match the sources from the list of potential sources to following domains:
    • population;
    • tourism/border crossings;
    • agriculture
  • Preliminary analysis of possibility for using sources to each domain - including:
    • Consideration of the legal aspects
    • Consideration of availability
    • The preliminary analysis of the methodological aspects
    • Consideration of the quality issues
    • Preparation of initial technical requirements
    • Build the list of exploitable sources for each domain

3. Describe the added value of delivered linkage between these sources to current statistics.

  • Analyze the list of exploitable sources for each domain
  • Prepare the map of linkages between Big Data sources (e.g which aspect of one data source can be used in several domains)
  • Describe the added value for each domain.

Task 2. Data feasibility

1. Carry out explorative analyses on two or three Big Data sources in the domain of population, tourism/border crossings or agriculture.

  • Selection of the most valuable big data sources for each domain:
    • Evaluation of the legal aspects;
    • Evaluation of availability;
    • Evaluation of methodology;
    • Evaluation of the quality;
    • Evaluation of technical requirements.
  • Analyzing results.
  • Preliminary assessment of the usefulness - developing the assessment factors.

2. Selection and recommendation of two or three big data sources for using in the domain of population, tourism/border crossings, agriculture.

  • Preparing the SWOT analysis (positive and negative factors of using several sources)
  • Recommendation of the most important and useful sources.

Task 3. Data combination

1. The experimental work (if practical work would not be possible it would be theoretical considerations including consultation with practice, e.g. Sandbox ):

  • Data collection
  • Data preparation
  • Data analysis

2. Describe practical, technical and methodological aspects when combining Big Data outputs in the statistical system. For example, differences in definition, populations and volatility etc.

3. Provide first answers on quality issues when combining Big Data with traditional outputs.

4. Provide answers on the question whether micro-data have to be used when combining Big Data estimates with traditional outputs or data at aggregated level can be considered.

  • Analysis of advantages and disadvantages of combining data
  • Preparing the list of criteria for combining data

Task 4. Summary plus future perspectives

1. Suggest pilots and domains with successful implementation potential for further elaboration in the second wave of pilots in 2017.

  • Recommendation on legal aspects;
  • Recommendation on availability;
  • Recommendation on methodology;
  • Recommendation on quality;
  • Recommendation on technical requirements.

Deliverables and milestones

Specific grant agreement 1 (SGA-1)

Deliverables

7.1

   Report for the domain of Populations containing basic information on:

   the data access (with legal and privacy aspects)

   the data quality issues

   the methodology (focus also on combining data)

   the technical aspects

    month 13

7.2

   Report for the domain of Tourism/Border crossings containing basic information as mentioned under 7.1

    month 13

7.3

   Report for the domain of agriculture containing basic information as mentioned under 7.1

    month 13

Milestones

7.4

   List of available Big Data sources in the domain(s)

   month 8

7.5

   Recommendation for using two or three Big Data sources in the domain(s)

   month 12

7.6

   Progress and technical report of internal WP-meeting    month 4

Specific grant agreement 2 (SGA-2)

To be specified.