WP3 Report 1 1









One of the work packages in the ESSnet Big Data project is a pilot study on electricity smart meters data. This is work package number three (WP3 Smart meters). WP 3 is carried out by partners from four national statistical institutes: Statistics Austria, Statistics Denmark, Statistics Estonia, and Statistics Sweden. The aim of the pilot study is to demonstrate the use of data from electricity meters, which can be read from a distance and measure electricity consumption at a high frequency, for production of official statistics. This kind of data can be of use for statistics on energy use and production, and it can be relevant also as an additional source for calculating census housing statistics, household costs, or impact on environment.

Some challenges ahead are getting access to data, possible representativity issues, and integration with other datasets (e.g. register data). In the pilot study, theoretical and practical issues will be addressed and topics of data access and processing and linking data, as well as calculation and visualization of statistics, will be covered.

This report is the first report from WP 3 on the potential of using smart meter data for production of official statistics. The first part gives a background covering the electricity market and the availability of smart meters in the partner countries, some examples of experiments with smart meter data in other countries, and current methodologies for producing energy statistic and  potential benefits of using smart meter data in the participating countries. The report further covers the access to smart meter data in the participating countries, including legal aspects. This also includes a survey of all EU countries on availability and access. Then follows descriptions of available data and the processing of data, including synthetic data, and the report ends with some concluding remarks.

The second report will address the methodology of producing estimates in three areas i.e. electricity consumption by businesses, electricity consumption by households, and vacant living spaces. A quality assessment of the estimates will also be carried out.

WP 3 is carried out parallel to other work packages in the ESSnet Big Data project. WP 3 will provide input to the methodology work package (WP 8) that will start in the second phase of the project and which will summarize approaches to methodology and quality assessment when dealing with big data sources. Dissemination of WP 3 results is part of the dissemination work package (WP 9).



Electricity market in the partner countries


The Estonian electricity system connects the power stations in Estonia, the network operators and electricity consumers (Figure 1). The largest Estonian producer of electricity and heat energy are Eesti and Balti power stations running on oil shale, owned by Eesti Energia, which provide over 90% of the electricity produced in Estonia and supply the whole town of Narva with heat. In Estonia Elering AS  as the transmission system operator will have to make an effort to integrate the domestic market with other markets in the Baltic and Nordic countries. The Estonian electricity system in turn belongs to the larger synchronised united system BRELL, which controls the AC power lines connecting Estonia to the neighbouring countries of Latvia and Russia. EstLink 1 and EstLink2 are direct current undersea cables between Estonia and Finland and is a presumption for an open energy market. From January 1st, 2013, Estonia’s electricity market is completely open and all customers are eligible consumers.

WP3 Elering network map.PNG

Figure 1. Elering network map[1]. 



The Danish electricity transmission net is divided into the national and regional net. Both are owned and operated by Energinet.dk.


Figure 1: Transmission net [1]

The Danish electricity market is liberalized in the sense that all consumers can freely change electricity suppliers, and suppliers can freely choose producers to buy from. This power is provided by the electricity supplier, who sell directly to consumers. The local grid is run and owned by the DSO's (Dsitribution System Operators), who are paid by electricity supplier (also known as the Balance responsibility parties) to deliver power. The backbone grid is run by EnergiNet, who also runs the Datahub recording all the transactions between consumers, traders and delivery companies [2].

Denmark trades on the Nord Pool electricity market. Nord Pool is owned by the Nordic transmission system operators Satnett SF (Norway), Svenska kraftnät (Sweden), Fingrid Oy (Finland) and Energinet.dk (Denmark) and by the Baltic transmission system operators Elering, Ligrid and AST. Nord Pool AS is licensed to organize and operate a market place for trading power with foreign countries [3]. Nord Pool essentially has two markets. The reason for this is that electricity is difficult to store. It is difficult to predict exectly how much is demanded and supplied. This means that traders can access the day ahead market (Elspot) or the intraday market (Elbas). More than 70 % of the total consumption of electricty is traded on the Elspot market. The Elsport market is primarily the very flexible and fluctuating supply. In case sudden and more or less unforeseen demand arises then the Elbas market is used for trading. It is here the producers of electricity from coal and gas trades, supply from this is more secure and regular [2] (Energinet.dk, 2013) 

Before 2013 all communication was decentralized between all actors. The consumer thus receives only one bill regardless of whether he switched electricity supplier or not.


Figure 2: Communication with Datahub [1].

In case the consumer has switched electricity supplier he is charged the grid fee from the electricity supplier, who then pays the DSO. Before the liberalization if the consumer wished to switch electricity supplier he would receive two bills - one from the electricity supplier for the use of electricity, and one from the distribution system operators for the use of the grid [2].


Figure 3: Communication with Datahub [1]

The role of the DSO is to maintain the local grid in a geographically limited area. It is the DSO who collects consumption data from the consumer and report it to the Datahub, who then reports this to the electricity supplier. The DSO charges grid fees from the electricity supplier, who charges the consumer the fee. The electricity supplier uses the consumption data it gets from the datahub and charges the consumer the price and tariffs. It is the electricity supplier who buys electricity on the Nord Pool the Engross market, the trade to the consumers is done on the detailmarket. Furthermore the electricity supplier is responsible for the communication with the consumer. The supplier collects background information on the consumer (name, address etc.) and delivers this information to Datahub. 


The Austrian energy market is liberalised since 2001. The consumer can choose the energy provider, but the grid provider is fixed by the location of the customer. There are over 100 grid providers and about 130 energy providers on the electricity market in Austria. The grid is under oversight by the regulatory body E-Control. In the Austrian Power Grid AG about half of the electricity produced in Austria is transported. The length of the system is over 6.700 kilometres.

Austria Netz 2011.jpg

Figure 2. 


In Sweden, the grid operators (net owners) and the electricity trading companies are the two main actors on the market. Since 1996, end customers can freely choose a trading company, but not the grid operator. The 162 grid operators own the local network and have local monopoly of distribution.

El market.jpg

Figure 3. Swedish energy market. 

Figure 3 pictures the market. The core grid network is owned, developed, and maintained by Svenska Kraftnät. The grid in total is about 551 000 km long. The Swedish Energy Markets Inspectorate inspects Svenska Kraftnät and regulates the companies on the energy market. SWEDAC implements EU directives and certificates quality on smart metering systems. The grid operators are responsible for measuring, calculating, and reporting transmitted electricity hourly on in and out directions. There are up to 5.3 million measuring points on the network. The Swedish Energy Agency is responsible for the official energy statistics.

Smart meters

The energy market in the world is changing and there is a tendency for the power generation to move from centralized plants to distributed renewable energy sources. The European Union's Third Energy Package [4] set a goal to further open up the energy market and increase competition between energy providers. All the changes require new solutions to build up energy grids and measure consumption. Smart grids are energy networks that can automatically monitor energy flows by smart meters and adjust to changes in energy supply and demand accordingly. A smart grid is "an electricity network that uses digital and other advanced technologies to monitor and manage the transport of electricity from all generation sources to meet the varying electricity demands of end-users”[5] 

A smart meter is an electronic device that records consumption of electric energy at regular intervals (an hour or less) and communicates the information at least daily back to the system operator for monitoring and billing. Smart meters enable two-way communication between the meter and the system operator and the central system can determine the interval of recordings. The meters can use a wireless connection, GPRS, power line carrier – transmission over power lines, internet connection, or some other type of communication channel. The aim of smart meters infrastructure is to provide system operators with real-time data on power consumption and allow customers to make informed choices about energy usage based on the price at the time of use. With smart meters and accurate measurements, consumers can adapt their energy use to different energy prices throughout the day, saving money on their energy bills by consuming more energy in lower price periods.

To have functioning and effective retail markets European distribution system operators are evolving towards information hubs which store data from smart electricity meters and regularly record electricity consumption of customers. Such functionality enables customers to smoothly switch between suppliers.

The European Commission has proposed a deployment plan for smart electricity meters in the EU Member States on the basis of economic assessments of long-term costs and benefits [6] and foresees to achieve almost 72% deployment rate by 2020. Where roll-out of smart meters is assessed positively, at least 80 % of consumers shall be equipped with intelligent metering systems by 2020 [4] (see details of deployment plans [7]). The responsibility of implementing a roll-out plan is on each European member country. The  roll-out plans are very different in the member countries;[8] hence the data access approaches will also differ between countries.  

An overview of the roll-out plans in participating countries of the WP3 is given in Table 1 (see [9] for more details).

Table 1. Smart meter deployment policies in participating countries.

Country Deployment strategy Metering points in the country Implementation speed Penetration rate by 2020 Data refresh rate Communication technology
Austria Mandatory roll-out 5.7 mn. 2012-2019 95% 15 minutes From the smart meter to the data concentrator – 70% PLC and 30% GPRS, From the data concentrator to the Data Management System – 100% Fibre Optics
Denmark Mandatory roll-out 3.28 mn. 2014-2020 100%

15 minutes

Estonia Mandatory roll-out 709000 2013-2017 100% 1 hour PLC – 90%, GPRS – 10%
Sweden Voluntary 5.2 mn. 2003-2009 100% 1 hour Mix of GPRS, PLC and/or Radio (46%), PLC only (37%), Radio only (17%), GPRS (1%)


The use of smart meters raises privacy concerns as, depending on the frequency of data collection, significant personal details about the lives and private activities of customers can be revealed[10]. Within the European Union, consumer personal data is protected by the EU's Directive on the processing of personal data[11]. Most smart meters currently being installed worldwide record electricity consumption data hourly, half-hourly or at 15 min intervals. This can provide a strong indication of for example occupancy, but has much less potential to reveal individual appliance use [10].

Examples of smart meter studies

UK smart meter experiments

The Office for National Statistics (ONS) has conducted a trial using smart meter data for estimating occupancy rates in a residential area. The study got access to half-hourly electricity consumption of 6445 households from 2009 and 2010. In a small training data set, dwellings were manually labelled as either "occupied" or "vacant" each day during the studied period, this was done for ten households for the 536 days data were available. The study tested eight algorithms to see which one mirrored the result of the manual classification best. The best algorithm looked at variation over a 24 hour time period, and the correct classification of the households was close to a hundred percent. Possibly the algorithm succeeded because the manual labelling was primarily looking for low variation as an indicator.[12]

ONS also studied data from Department for Energy and Climate Change (DECC) using the annual energy usage from gas and electricity meters in 2011 and 2012. ONS sourced counts of domestic meters for all local authorities and 48 Lower Layer Super Output Areas in England and Wales. The analysis focused only on the 3.8% to 4.0% of households that used 0-500 kWh of electricity per year. The purpose was to identify a vacant property or a second home/holiday home. The DECC data set was compared with 2011 Census data and with council tax data. No clear relationships between the two data sets were found. However, for local authorities and small areas like LSOAs, the data showed some relationships that can be useful for future censuses. In the future, the research should source data for LSOAs or smaller areas for the country. [13]

Ireland smart meter study

The purpose of the study was to investigate if smart meter data can be used for identifying household composition. The data are well structured and contain unique meter (household) identifiers. The challenges were for example to integrate smart meter data with other data and to handle outliers, missing values, and incomplete records.  A pre-trial survey of 4174 families was conducted for estimating the composition of the households. 21 explanatory variables were used for fitting a regression model and the model was then simplified until the statistically significant (p<0.1) variables remained. The reduced set was used as input to two neural network (NN) approaches, bi- and multinomial NN. The data were tested for categorizing 16 types of household compositions. Resylts were mixed. The binomial NN model gave the most accurate result in this case study, but this model also generated false negatives on certain categories. Multinomial NN model predicted certain categories better than other categories. The highest accuracy prediction returned 75% accuracy and the worst returned 0%.[14]

Canada pilot on smart meter data

Statistics Canada obtained smart meter data for a subset of households for one month. Internal experiments at Statistics Canada were followed by an experiment in the UNECE Big Data project. [15]

Based on a real Canadian dataset a synthetic data set was generated. This dataset had no reference to the real data and numerical values were altered. The results were not the same as with the real dataset, however the project team stated that the synthetic data maintained a realistic distribution.[16] The generated dataset was uploaded to the UNECE Sandbox and experiments on computational speed with different tools (Pig, R, SAS and Pentaho) were carried out. Figure 4 is a visualization of hourly consumption per day for one month (low values of consumption in green and high values of consumption in red). 


Figure 4. Hourly consumption per day, Canadian example.

Current practices for official energy statistics

Users of energy statistics

Energy statistics are used in a wide variety of academic disciplines, as well as government agencies and the private sector. There is a need for more timely and detailed energy statistics, notably for monitoring and guiding policy towards a sustainable green economy. This is a major focus area for the EU. Particularly the inclusion of background information on electricity consumers is pertinent to the discussion about green policy by most actors mentioned above.


Current production of energy statistics

Survey data are collected from all enterprises that produce or sell electricity. Data on the imports and exports are received from the Foreign Trade Statistics.

For obtaining the household energy consumption data, irregular studies have been conducted over several years. In the intermediate years, the data evaluation is based on the previous survey, on the data from enterprises selling energy and on the data received from the Household Budget Survey. The last survey for household energy consumption was conducted in 2011.

For enterprises, units with at least 50 employees are enumerated completely. A simple random sample is drawn from smaller enterprises, using stratified simple random sampling. The population of enterprises is the statistical profile based on the data of the Ministry of Justice Centre of Register. The statistical profile includes economically active enterprises. The Statistical Profile Service is the holder of the business register for statistical purposes, which is used for the creation of the sampling frame of the target population. Data are collected through eSTAT (the web channel for electronic data submission) and from administrative data sources. eSTAT is also used to monitor the completion of questionnaires. The questionnaires have been designed for completion in eSTAT by the respondents themselves and they include instructions and controls. The questionnaires and information about data submission are available on Statistics Estonia’s website at http://www.stat.ee/andmete-esitamine (in Estonian).

Missing data are imputed with data from the previous period or with the average of the stratum. First editing and validation process takes place at micro data level. The data editing program is applied to all entered date. All errors are marked and if it is necessary the data are improved together with the data producers of enterprises. The data collected by sample survey are expanded to the whole population. Second editing and validation process takes place after the aggregations and grossing up. If unrealistic estimators or very big changes compaerd to previous period are discovered the micro data are checked again.

The consumption indicators of the public sector have been calculated using the data obtained from the Ministry of Finance’s database for the expenditure of energy.

The energy statistics of Estonia is available on the web page of Statistics Estonia (http://www.stat.ee/energy).

Expected improvements from smart meter data

  • Improved periodicity of electricity consumption statistics (from annual to quarterly, or even monthly);
  • To identify household consumer and find out their electricity consumption, if possible use the data for modelling the electricity consumption by the end-use;
  • To replace current data collection from businesses by online questionnaire by electricity smart meters data source i.e. to produce aggregated electricity consumption data according to the requirements Regulation (EC) No 1099/2008 of the European Parliament and of the Council of 22 October 2008 on energy statistics and environmental statistics needs[17];
  • Use of electricity data as additional data source together with data from Population Register, Estonian Register of Buildings and Address Data System for identifying unoccupied conventional dwellings.


Current production of energy statistics

In Denmark energy statistics is produced by four primary actors.

  1. The Danish Energy Agency
  2. The Danish Energy Association (The primary industry body)
  3. Statistics Denmark
  4. The Danish Oil Industry Association

The other actors publish detailed statistics on the energy systems as a whole and consumption as well as prices. The Danish Energy agency in particular has very detailed statistics on power consumption by the household sector.

Statistics Denmark primarily publishes tables containing an overview of the Danish energy consumption, such as the one provided in Environmental-Economic national accounts. One example is the overall distribution of electricity consumption on industry types, energy type and year, named [www.statistikbanken.dk/ENE3H ENE3H]. Today the accounts are based on surveys, and some smaller industries are merely estimated. Comparable time series run back to 1966.

Statistic Denmark also publishes data on energy consumption by households, as part of the Household Budget Survey. The main table can be found at [www.statistikbanken.dk/FU1 here]. Finally Statistics Denmark publishes numbers on energy price changes, as part of the general consumer price index. The main table can be found at [www.statistikbanken.dk/PRIS11 here]. There are also other numbers on electricity consumption, such as the allocation for electricity expenditure on the budgets of municipalities.

Expected improvements from smart meter data

The Danish expectations from the smart meter data can be separated into three categories.

  1. Improvements and cost reductions to existing statistics
  2. Expansion of existing statistics - in terms of timeliness and aggregation levels
  3. Secondary usage of data - i.e. combining data with all the other register data available at Statistics Denmark for projects that are not yet conceived.

One major goal of the Danish experiment with smart meter data is to link smart meter data with business registers, so that it will be possible to save money on surveys. There are fewer efficiency gains when it comes to the Household Budget Survey, as the survey must go on, and electricity consumption is only a small part of it. However if electricity data can improve quality, this would be an interesting use of the data.

It will also be estimated if the smart meter data would allow Statistics Denmark to shorten production times on energy statistics. Depending on the quality of the record linkage, it should be feasible to drastically reduce production times on many of the relevant tables. It might also be possible that more detailed tables can be published, and that new time series containing new information can be published, especially if these do not overlap with series published by other actors.

It is possible to imagine several uses for smart meter data as an extra source of background data for error detection in other statistics. It is also possible to imagine it being used in combination with other data sources, such as mobile phone data. Day/night population estimate and statistics on residency are two possibilities. The plan is to evaluate each opportunity as Statistics Denmark gains familiarity with the data.


Current production of energy statistics

Current energy statistics produced by Statistics Austria cover a broad spectrum of energy-related information covering the energy flow in the Austrian economy from the primary energy supply through the transformation processes to final energy consumption and the useful energy obtained from it, subdivided into useful energy categories. Additionally, information about energy prices and energy-source specific taxes is collected. Two surveys are used for estimating the energy consumption of businesses: the first one is a subsample of the structural business survey and is used for the sectors of trade and services and the second is a subsample of the short term statistics for the sectors of production.

The energy consumption of households is also estimated based on a survey and detailed information about electricity and gas consumption of Austrian households is collected with a electricity- and gas journal.

The regulatory body of Austria  E-Control is in charge of Austrian electricity statistics with information available on the market (e.g. supplier switching per consumer), the grid information as well as the electricity balances.

Expected improvements from smart meter data

Depending on the kind of data available and the possibility of linking the smart meter data to households and/or businesses, it is expected that data collection for surveys can be simplified and therefore reducing the cost of the surveys and the response burden. It is also expected that if the linkage to households would be possible improvements to the dwelling register in regards to vacant houses can be made, which would also improve the results of the register-based census. 


Current production of energy statistics

Swedish official energy statistics are partly produced by Statistics Sweden on commission for the Swedish Energy Agency, the agency responsible for all official energy statistics. Statistics Sweden does not produce statistics on household energy consumption. All official energy statistics are available at the web page of the Swedish Energy Agency,  http://www.energimyndigheten.se/statistik/. 

The official statistics on energy production and consumption produced by Statistics Sweden are based on a number of surveys of producers, suppliers, and industries. Secondary use of data and modeling is employed. For example, statistics on regional and municipal energy production and use is based on secondary use of data from three surveys (Yearly electricity, gas, and heating, Energy use in industry, and Oil use) and modelling based on information from three additional surveys (Energy use in small industry, Energy use for detached houses, and Energy use in farming). Statistics are broken down for example by region, municipality, type of production, and type of fuel, and published by year, quarter, or month. The purpose is to get a picture of the balance of production and consumption, and the statistics are used by the municipalities for planning and for following up environmental goals. Statistics on prices, change of suppliers, and use of energy within different industrial sectors are other examples of official energy statistics. Data are collected from the energy supplier and from industries, by questionnaires, and reporting is mandatory. A summary report in English is available here

Expected improvements from smart meter data

With data from smart meters available through a planned data hub, it is expected that data collection can be simplified and cheaper, production time can be shortened, and the quality of the energy statistics can be improved. A goal of particular interest is the possibility to reduce the response burden. Linking with other administrative sources such as the Business register will be an important issue. Smart meter data could potentially be of interest for other types of statistics, for example as input to price indices and for improving the Household Budget Survey (household cost for electricity). Of special interest is to investigate the possibility to improve the dwelling statistics. The registered place of living does not necessarily agree with the actual place of living, and in order to estimate the number of vacant or temporary dwellings (i e summer houses), electricity consumption could be an important factor.

Data access

Smart meter data access in EU countries

A survey on access to smart meter data was sent to the NSI of all EU member countries in the spring of 2016, and so far there have been 18 responses. Only two countries currently have access to data, Denmark and Estonia.

Several countries where aware of substantial legal barriers, it was unclear if market participants could even share data with each other. Some countries such as Poland are in the process of drawing up legislation that will enable smart meter data use.

In terms of data hubs, only one country mentioned that one was under construction (Norway). Denmark and Estonia already receive data through central data hubs, and a hub is being planned in Sweden.

Table 2. Smart meter data in EU countries 

NSI Plans to explore smart meters? Legal obstacles Data hub available
Sweden Yes Yes No
Norway Yes No No
Hungary Yes No No
France Yes Yes No
Lithuania No No No
Cyprus No No No
Bosnia and Herzegovina No No No
Poland Yes Yes No
Belgium No No No
Germany Yes Yes No
Portugal No No No
Luxembourg No NA No
The former Yugoslav Republic of Macedonia No NA No
Denmark Data received No Yes
Estonia Data received No Yes
Austria No Yes No
Greece Yes Yes No
Spain No No No

Data access in the partner countries 


The roll-out plans of smart meters in Estonia expect full coverage in Estonia by 2017. Transmission System Operator Elering AS manages the Estonian electricity system in real time. Elering AS is responsible for its operation and ensures the supply of high-quality electricity to consumers at all times. In August 25, 2012 Elering AS launched the Estonian Data Hub (Andmeladu), a software/hardware solution that manages the exchange of electricity metering data between market participants, supports the process of changing electricity suppliers in the market, and archives the metering data of electricity consumption. The Estonian data hub is a system that holds all agreements related to electricity transfer and consumption and all measurement data.

Electricity consumers can: 

  • look at their electricity consumption points and their agreements;
  • view historical electricity consumption data;
  • authorize one or more electricity sellers to access your data, so they can make personalized offers

A user of the Estonian Data Hub (network operator and open supplier) can use the Estonian Data Hub to exchange metering data, submit, save, change and augment notices regarding electricity and network service contracts, and perform other activities stemming from the opening of the electricity market. Electricity consumption is one of the most impotent parts of energy data to make Energy balance sheets and supply users with statistical data. Access to this data provides an opportunity to reduce the reporting burden on businesses.

To obtain the data of the Data Hub Statistics Estonia first contacted Elering AS in 2013. Thereafter representatives of Statistics Estonia and Elering AS discussed the legal issues, the possible data set, the details of the ,and the data exchange. In 2015 the Director General of Statistics Estonia and the Chairman of the Board of Elering AS met to discuss the possibility of receiving data from the Data Hub. After the meeting Statistics Estonia sent an official data request, which Elering AS forwarded to Estonian Data Protection Inspectorate to get their opinion on the legal aspects of exchanging data. The Statistical Office has received the agreement of Elering AS and the Estonian Data Protection Inspectorate to use electricity consumption data from the Estonian Data Hub to produce statistics.

There are no legal barriers for getting access to data. Collecting data is legislated through the National Statistics Act. To make the request for data legally viable Statistics Estonia referred to the Official Statistics Act – “§ 28 Obligations of respondents and access to information” and “§ 29 Use of administrative records and databases” which state that if there is administrative data available the owner of the data has to provide it to Statistics Estonia. Upon the production of official statistics, a producer of official statistics shall primarily use data collected from administrative records and databases, data generated in the course of the activities of state and local government authorities and legal persons or collected by them, if such data allow the production of official statistics complying with the quality criteria of official statistics.

Elering AS decided that they would not make any changes to their data platform, but would give Statistics Estonia a copy of the Data Hub. Statistics Estonia received the data from Elering AS from period 2013- 2014. The data was transmitted manually by an external hard disk as a dump from MySQL database. In May 2016 it was agreed that Elering AS provides data for the year 2015. Statistics Estonia has submitted a request for Elering AS to transmit data every year.


All data is collected centrally by EnergiNet who administers DataHub, a central system that collects all data from Danish energy markets. Statistics Denmark established a connection with EnergiNet in 2015, the primary purpose was to make the energy data available to scientists and others through the secure data center of Statistics Denmark. In may 2016, the first dataset was received. Currently, there is only access to data from 2013, when the percentage of smart meters was low (a precise estimate is not possible to reproduce), but Denmark expects to have a 100 percent roll out of smart meters by 2020. It is likely that Statistics Denmark will soon get more recent data, with a smart meter proportion closer to 70 percent. Currently the technicalities around continuous delivery are under investigation.

In Denmark, there are no legal issues with accessing the data, and the role of the data provider is enshrined in central regulation of the energy market. Combined with EU goals on energy market integration, it is assessed that the source of the data is sustainable. The data is collected under an agreement between Statistics Denmark and EnergiNet. This is driven by the desire of EnergiNet to make their data available to scientists in a secure and controlled way. Potentially the data could be obtained using the general data access provision in the legislation of Statistics Denmark.


There are a lot of grid operators (each state has at least one) in Austria and currently no strategy for a centralised data hub for smart meter data. The goal by 2020 is that coverage of the smart meters is at least 95%. The smart meters will have a 15 minutes reporting interval. Each customer can opt out of getting a smart meter, however it is expected that not more than 5% of population will opt out. E-control (the regulatory body in Austria) has permission to access aggregated data (higher level of aggregation than daily). The ministry can make special legal arrangements so that the NSI could get access to electricity data but there is no infrastructure. The best case scenario is for the NSI to get access to daily aggregated data, however the ministry is not keen to make special provision for the NSI at the moment.


100 percent roll out of smart meters is expected by 2020. Svenska Kraftnät is currently planning and developing a data hub information model together with the Swedish Energy Markets Inspectorate, including definition of roles and responsibilities. Statistics Sweden is in the reference group as one of the stakeholders.

The hub will be fully developed and in place by the fourth quarter of 2020. It centers at the electricity provider and will facilitate better communication between customer and provider. Changes in regulations might be necessary in order to complete and fully implement the ideas behind the hub. Through the hub, metering data, measuring data, customer data, and contract data, will be managed in one common system. It is in the interest of both Stat Sweden and the net owners that the hub is able to deliver high quality information for statistical purposes. Today, the survey Yearly electricity use in particular is very cumbersome for the respondents, and there are doubts about the quality of the data.

Statistics Sweden has recently received a test data set that will be analyzed and used throughout the ESSnet Big Data. There are still issues to solve in order to secure future access to data in the Hub. The Swedish Statistics Act ensures that data can be collected and used for statistical purposes, but there are restrictions on personal data. Since smart meter data is available at household level, this is considered as personal data and there are legal barriers. It is the responsibility of the Energy market inspectorate to resolve legal issues concerning the hub.  


Due to the fact that there are ambitious roll out plans of smart meters in Europe the use of smart meter data in statistics has gained remarkable attention. Despite the interest there are only two countries involved in ESSNet Big data project which have real access to the data – Denmark and Estonia – other countries are facing with some technical problems or there are legal restrictions which makes the use of smart meter data difficult. To get access there must be a legislative basis which allows collecting personalized administrative data by the NSI and the data must be accumulated in a centralized data hub. 

Data handling

Estonian data

Description of data

In 2015 it was agreed that Elering AS provided data of smart meter recordings from 2013- 2014 to Statistics Estonia. The data contain hourly recordings from 709 000 metering points and amounts to 1.5 TB. Recordings per year are about 365 * 24 * 709000 = 6 210 840 000.

The most important tables in the data are metering data, metering points, agreements and customers (Figure 5) which contain information about by whom and where electricity was consumed (Figure 6):

  • meetering_data - hourly information of the amount of produced and consumed electricity,
  • meetering_points - information about location and the type of the metering point (possible types are: remotely readable, single and dual tariff manually readable),
  • agreements -  information on when electricity contract was signed/ended and what type of contract it is,
  • customers - information about private and legal persons who signed the contract.

Teh detailed description and code for creating tables of smart meter data can be seen in Annex A1. The total number of rows in the tables are metering_data - 12 442 468 837, metering_points - 722 161, agreements - 1 689 924, and customers - 696 831.

WP3 Smart DB.png

Figure 5. Structure of the database - main tables.

WP3 Smart data.png

Figure 6. Sample data (synthetic content).

From the database, sample time series for two subjects were selected and first visualizations indicated that there is clear difference in consumption pattern depending on consumer type (Figure 7). For visualization a Pyhton script was used and the consumption was aggregated by a command

  mOut[iI] = mM['out'][(dates.weekday * 24) + dates.hour  == iI].sum() 

and visualized by a command from matplotlib - a python 2D plotting library


Linking metering point data with geo information enabled to visualize population density. 

WP3 weekdays hours.png

Figure 7. Weekly real data (7x24) from household (left) and business (right) aggregated and scaled, 2013-2014.

Analysis of 2014 data shows that of all metering points, 89% belong to households and 11% to businesses and 49% were smart readers. The distribution of consumers was found by a sql statement in Hive (see annex A1). By the end of 2014 the number of valid agreements in the database was 875 591, which use 711 030 distinct metering points and are related to 545 290 distinct customers. By the end of 2016, all the metering points in Estonia are expected to be remotely readable.

Table 3. Distribution of metering points by type by the end of 2014.

Type Businesses Households Total
Smart reading 54% 48% 49%
Manual reading 46% 52% 51%
Total 80 853 630 177 711 030

Processing data

Hardware and software

Today there are two common solutions for transmitting data to Statistics Estonia (SE):

  • SE will read data over x-road (an xml based secure network);
  • Institutions (registers, companies) can send data to SE and into xGate (an xml based gateway over the x-road or with external login)

For smart meter data an external HDD was used for data transmission due to the amount of data, which makes use of alternative channels too costly and time consuming. In the future a channel receiving compressed data regularly might be developed.

The data of the data hub were received as a dump of the MySQL database. As there was no suitable infrastructure for processing the data in the MySQL database, the data was exported into csv files and transmitted to alternative processing environments. Exporting one month data took 2350 seconds and 24 months altogether took 15 hours and 50 minutes. Copying files from a PC to a server took 58 hours as the capacity of the link between two networks was limited. The sizes of the files for one year amounts to 160 GB. Alternative environments for testing suitability for big data analysis are selected HP Vertica database, Hortonworks and Greenplum. Vertica is a column-based database and runs on one server (not the most powerful solutions but enough at the moment). Hortonworks is distribution of Hadoop ecosystem which is based on the Hadoop Distributed File System and processes big data in a distributed environment across clusters of computers. Greenplum Database® is an advanced, fully featured, open source data warehouse. The data are loaded into alternative systems and tests are performed to assess the performance and capabilities of the alternative data processing platforms.

Cleaning data

The data hub owner does not process the data and they only collect, store and prepare aggregate total consumption and production.

For the use of smart meters data for statistics, three additional steps are necessary:

  • Geocoding of metering point addresses,
  • Transformation of the timestamp of metering fact into readable timestamp format,
  • Anonymization of private personal data in the customers table

The address has to be coded as the information is included as free text. The geo-id was created to enable further linking with other administrative data sources. For creating the geo-id and link it to metering point addresses, the Estonian Land Board’s web based Massgeocoding service was used. The massgeocoding service enables users to geocode large quantities of address data and for processing the address data was divided between the files containing 100 000 lines each. The service normalizes the address and finds the coordinates of the address point by the normalized address. The process of geocoding of one file took 3-5 hours. The address geo-coding fails if the address contains text which is not part of the address or some part of the address is missing.

There are several reasons for failure: 

  • the address is inaccurate 
  • street name or number is missing or misspelled
  • the address does not exists in the Address Data System 
  • the address is no longer valid 
  • the address text contains additional location description that is not a part of the address (for example company name or building type)

Addresses that contained similar additional location descriptions were processed automatically by removing excess information and then geo-coded again. About 90% of the metering point addresses were geo-coded automatically. The other pre-processing step included anonymization of the data. For anonymization the customer table was linked by anonymization table by persons id and thereafter the original code was replaced by unique anonymized code. All the personalized data was deleted from the table of customers and replaced by a field of date of birth.

The timestamp of recording time, expressed as a number in unix time format, was converted by an Hive function from_unixtime() to a string representing the timestamp.

As the geocoded data includes detailed information from different address levels (maximum 8 levels) it was possible to aggregate consumption data by a query which included grouping GROUP BY <level code>. For visualization of spatial electricity consumption data the ESRI mapping software was used.

Visualising first results

Figure 8-11 show the first visualisations of electricity consumption by Estionian private persons or legal persons in 2014.


Figure 8. Average monthly electricity consumption by private persons, January 2014.


Figure 9. Average monthly electricity consumption by private persons, July 2014.


Figure 10. Average monthly electricity consumption by legal persons, January 2014.


Figure 11. Average monthly electricity consumption by legal persons, July 2014.

Danish Data

Data Delivery

Statistics Denmark has received consumption data for all Danish electricity meters, but the data are from 2013 and do not yet include hourly readings or personal ID based keys. A more recent dataset is under delivery that will make better linking and more detailed analysis possible. The first data set was delivery manually, but in the future data will probably be delivered using SFTP.

Description of data

The Danish data contain the same information as the Estonian data, and can be broadly categorized into the following categories.

  • Physical location of metering point (Address, coordinates and the like)
  • Contract details (who pays)
  • Longitudinal contract information (Makes it possible to figure out who paid for what consumption at any point in time)
  • Usage of electric heating
  • Consumer types (company, private person)

A readings table, analogous to the Estonian table, is also included.

Processing Data

Hardware and Software

The data have been read into an Oracle database and where originally delivered as CSV files. Because the data do not yet contain hourly readings, the dimensions are easily manageable using a traditional data analysis stack like Oracle and R.

Record Linkage

The first data version only includes addresses, this means that only a subset of the readings can be linked to individuals and companies. In the future the data will contain unique ID's of the individuals and companies paying for the electricity. Once there is full linkage to the datasets on population and companies, there is linkage to all the many registers that contain links to these two basis registers.

Swedish Data

Data Delivery

Statistic Sweden has received a test data set representing full coverage for two municipalities in Sweden: Täby and Högsby. Täby is a middle size municipality in the mid part of Sweden, on commuting distance from the largest city Stockholm. Högsby is a smaller municipality in the southern parts of Sweden. The data have been delivered manually. The Täby data was received first, and as a consequence, most of the description below refer to the Täby data.

Description of data

The test data contain monthly aggregates of consumption. They include both businesses and households.

The following variables are available in the test data sets:

  • Net owners identification of unit
  • Street name and number
  • Postal code and area
  • Apartment numbers for apartment buildings
  • Organization number for businesses
  • Electricity consumption per month
  • Year and month
  • Production of electricity

It is possible to divide the units into households in single houses, households in apartment buildings, and businesses by some simple assumptions: a metering point is

  • a business if it has an organization number,
  • a household in an apartment if it has an apartment number and no organzation number,
  • a detached family house if none of the above.

The Täby data contain around 360 000 observations for 2015 and 2016 respectively, which translate to about 3700 metering points for around 1800 businesses, and about 10200 metering points for households.

The Högsby data containg about 38 000 observations for 2016; 380 metering points for around 150 businesses, and about 2000 metering points for households.

Units can produce their own electricity (i e solar cells or wind mills). The electricity is used by the unit, and the surplus is transferred into the net and measured at the metering point. Thus the production variable is only a measure of surplus production, zero production means that there is no surplus, and no value means that the unit does not have their own production.

Processing Data

Hardware and Software

The test data is small and easily handled in SQL and SAS. How to handle the full data set from the hub has not yet been considered.

Cleaning data

The addresses may contain errors since they are manually entered as free text and there are usually few restrictions on how to enter them. A program for checking the addresses, based on text analysis, was developed and tested on the Täby data. The program checks for similar addresses for a given area code.

Addresses may be completely false, and it appears in the Täby data that this problem is most serious for the metering points that measure electricity consumed by the municipality as a unit. The municipality administration can be identified by its organization number and may have a large number of metering points. Since these are only separated by their addresses, it is not possible to say what kind of activity within the administration they refer to, but this might be solved by linking to the business register (see below). However some addresses are not correct addresses and are saying things like “street light at crossing xxx”.

Record Linkage

The organization numbers are unique identification of businesses and can be used for linking to the Business register kept by Statistics Sweden. It might also be possible to identify work places by the address of the metering point.

The addresses for households in apartments, with apartment numbers. can be linked to the Building register. From this register, unique dwelling identification numbers can be retrieved and used for linking to the population register. In this way, it might be possible to find households, and household and building information relevant for analyzing electricity consumption: Size of household, income of household, size of apartment, year of construction, etc. Another issue would be to link the metering points to geo codes.

The linking approaches mentioned above will be tested in SGA 2.

Data quality indicators

For the goals of the ESSnet, data quality indicators could serve two purposes:

  1. Compare quality of input data between countries
  2. Establish a general quality guideline for smart meter data that will serve as an introduction for new users to the quality level of the data.

The following list of data quality indicators is based on the quality performance indicators developed in the ESSnet exploring quality issues in administrative data. [18]

  • Undercoverage
    • Proportion of households and companies that do not have smart meters
    • Proportion of consumption that is not covered - some overlap with the discontinuity problem.
  • Percent of units that fail checks
    • Percent of data failing basic checks, such as checks for extreme readings. 
  • Percent of units that are adjusted
    • Percentage of units that are adjusted through some error correction or follow up contact.
  • Percent imputed
    • Percentage of households or businesses imputed
  • Periodicity
    • Frequency of data delivery
  • Delay
    • Possible delays in delivery of data
  • Common units
    • In case smart meter data are delivered from multiple sources - possible overlap and duplicates. This includes potential duplicates from a single source.

Table 4 shows the result for the Estonian data and Table 5 the results for the Danish data.

Table 4. Assessment of Estonian data based on the quality indicators

Indicator Assessment

Around 50% of households and companies did not have smart meters by the end of 2014.

The smart meters do not measure electricity produced for own consumption, they only measure purchased electricity. The total amount of unmeasured consumption is negligible compared to total consumption.

No duplicate records or other overcoverage problems were discovered.

Percent of units that fail checks It is expected that the reading are in accordance with real consumption as it is invoiced to customers. There are no identified significant errors in the data.
Percent of units that are adjusted The metering data are not changed by Statistics Estonia and no adjustments are made in readings.
Percent imputed

There are no missing values detected in the records.

At this stage, it was not possible for Statistics Estonia to check whether there are any missing records in the dataset.

Periodicity Data are provided yearly at the moment - higher frequency is possible in the future.
Delay As the network operators have three months for correcting data, the data are provided after that period.
Common units In Estonia there is no other source for the data.

Table 5. Assessment of Danishdata based on the quality indicators

Indicator Assessment

The 2013 data contains reading for almost all Danish meters. It contains hourly readings for only a small subset of companies using more than 100,000 KWH a year. There is no estimateable undercoverage on smartmeters.

Overcoverage There are no known examples of overcoverage in the core data. But there can be overcovagere in some subpopulations due to poor linking with other registers.
Missing values The core meter data does not have unexplained missing values, but missing data occur in linked data.
Percent of units that fail some or all checks The necessary quality checks are still in development.
Percent of units that are adjusted The metering data are not adjusted.
Percent imputed No imputation are applied and data is handled in read only mode.
Periodicity The possibility of monthly delivery is being examined.
Delay Still unknown.
Common units

The data hub aggregates all the data from the different providers, and handles conflicting data. Metadata on this process is not available to Statistics Denmark, but there are no indications that it concerns more than a miniscule proportion of records.



Denmark and Estonia have handled the data in much the same way. The data is also received in a comparable format, as structured data from relational databases. Some of the common challenges are:

  • Adress matching
  • Anonymizing
  • Lack of smart meter readings for all households.

The quality indicators show similar situations for both countries. There are close to non existent problems with faulty data, but problems remain with record linkage, and the percentage of households using smartmeters.

It is expected for both countries that the proportion of smartmeter readings improves rapidly over the next few years, for Denmark unique ID's for most business and households should also be included in the future.

Objectives for generating synthetic data

The process of generating a synthetic dataset based on a real dataset is not clearly defined; the objectives or use cases for the dataset highly influence the necessary steps. The minimal requirement is to reproduce the original structure, whereas the best solution for a synthetic dataset is to reproduce close-to-reality results for different aggregation levels.

The approach in this work package is to have a two-step approach with two different objectives in mind:

  1. Generating demo output with “realistic” results.
  2. Test, scale and develop (new) statistics and algorithms where linkage to enterprise or household characteristics is necessary.

1. Demo output

New or specialized dissemination tools and methods should be developed for smart meter data, e.g., visualisations on maps. Prototypes for possible output can be generated and discussed before real data sets are available. For most of these methods there will be a need for linkage to some characteristics such as region, household characteristics, or business characteristics. The advantage of generating demo output from a synthetic micro dataset in comparison to synthetic tabular data is the possibility of describing and testing the entire process from the underlying micro data to the final output.

2. “Realistic setting”

Important results should resemble the real dataset as closely as possible and structural variables, e.g., household and enterprise composition, should (approximately) match with known population characteristics. The donor data set of one country can be used for generating a synthetic dataset for another country or for an artificial population. Of course, the “power usage” which might be quite specific for a country (or geographic location) will be “copied” and therefore only the structural variables will resemble the target population. In this setting more sophisticated algorithms can be tested, e.g., classification algorithms for household characteristics, but final evaluation of them should still be carried out on a real dataset.

The demo output should include:

  • Visualization of the change energy consumption over time (daily, seasonal,...)
  • Testing algorithms to identify vacant households
  • Displaying vacant households on a map

The main advantage of using a synthetic data set for the development of new outputs and algorithms, it is ready to use, accessible for everybody and the developed results are clearly labelled as demo and development output.


The situation regarding smart meter deployment and access to smart meter data differs between the four countries participating in work package 3. While Statistics Estonia and Statistics Denmark have access to full data, Statistics Sweden is begining to explore a small test data set, and Statistics Austria does not foresee any data access in the near future. Data hubs for electricity data are in place in Estonia and Denmark, and is planned for 2020 in Sweden. Statistics Estonia currently have data from 2013-2014, and Statistics Denmark from 2013. The two datasets contain similar information. It is anticipated that more recent data will be available during WP 3.

In order to assess the data quality, a set of straightforward quality indicators were applied to Estonian and Danish data. The indicators are meant to serve as general quality guidelines and as a tool for comparisons between countries, but no conclusions on their usefulness can be drawn yet.

The Estonian experience from processing and cleaning data involved geocoding the data and some first visualizations of results. In the absence of other real data, WP 3 plans to generate synthetic data. The main purpose is to get a data set on which methods and tools can be tested.

Future plans

Subsequent work will focus on further development of methodology, i. e. testing of the quality indicators as well as methodology for editing, outlier detection, imputation, and linking. This work includes finding criteria for selection of best methods. Further analysis of data (hopefully including new data sets from Estonia, Denmark and a first data set from Sweden) and completing and testing the synthetic data will also be carried out during the project. The final goal is to suggest frameworks for quality, methodology, and technology, and ultimately to produce new statistics based on smart meter data.

Ultimately the contents of the statistics will be improved in terms of periodicity, granularity and with better linkage to other records. Providing policy makers, academics and industry with a better picture at a time when energy statistics is more important than ever. It is plausible that further work will show that smart meter data can replace survey information in many cases, increasing production efficiency in the long term.


  1. 1.0 1.1 1.2 Energinet (2016): The Danish electricity retail market. Introduction to DataHub and the Danish supplier-centric model. dok nr. 15/08243-6. Energinet.dk October 2016. (accessed 22-12-2016)
  2. 2.0 2.1 2.2 Energinet.dk (2016): Introduktion til elmarkedet dok nr. 13/96911-15. Energinet.dk april 2016. (accessed 22-12-2016)
  3. Nord Pool (2016): About Us. (accessed 22-12-2016)
  4. 4.0 4.1 European Parliament, Council of the European Union (2009): Directive 2009/72/EC of the European Parliament and of the Council of 13 July 2009 concerning common rules for the internal market in electricity and repealing Directive 2003/54/EC. (Pdf, 1.18 MB)
  5. International Energy Agency (2011):Technology Roadmap: Smart Grids.([1], 3.02 MB)
  6. Giordano, V., Onyeij, I., Fulli, G., Sanchez Jimenez, M. & Filiou, C. (2012): Guidelines for cost-benefit analysis of smart metering deployment. Publications Office of the European Union. (Pdf, 3.11 MB)
  7. European Commission (2014): Commission staff working document SWD/2014/0189 Cost-benefit analyses & state of play of smart metering deployment in the EU-27 Accompanying the document Report from the Commission Benchmarking smart metering deployment in the EU-27 with a focus on electricity. (Pdf, 1.59 MB)
  8. Smart Electricity grids and meters in the EU member states, http://www.europarl.europa.eu/RegData/etudes/BRIE/2015/568318/EPRS_BRI(2015)568318_EN.pdf
  9. European Commission (2014): Commission staff working document SWD/2014/0188 Country fiches for electricity smart metering Accompanying the document Report from the Commission Benchmarking smart metering deployment in the EU-27 with a focus on electricity. (Pdf, 1.28 MB)
  10. 10.0 10.1 McKenna, E., Richardson, I., & Thomson, M. (2012): Smart meter data: Balancing consumer privacy concerns with legitimate applications. Energy Policy, 41, 807–814.
  11. European Parliament, Council of the European Union (1995): Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. (Pdf, 2.24 MB)
  12. Williams, S. & Gask, K. (2015): ONS Method paper no. 40 Modelling sample data from smart-type electricity meters to assess potential within official statistics.
  13. Gask, K. & Williams, S. (2015): ONS Methodology Working Paper Series No 6 Analysing low electricity consumption using DECC data.
  14. Carroll P., et al. (2013). : Exploration of electricity usage data from sma meters to investigate household composition
  15. UNECE (2015). Experiment report: Canadian Smart Meter Data
  16. Virgillito A. (2015). Computing Energy Consumption from Smart Meters Data
  17. European Parliament, Council of the European Union (2008): Regulation (EC) No 1099/2008 of the European Parliament and of the Council of 22 October 2008 on energy statistics. (Pdf, 224 KB)
  18. ESSnet on Quality of multisource statistics, Preliminary report, , (CROS, 2016), ESSnet page.

Annex A

A1: Creating tables and making queries in Hadoop Hive

Description of Data Tables

Description of the fields comes from the document Guide for using and joining the Estonian Data Hub.

Table A1-1. Description of the table agreements

Column name Type Comment
agreement_id BIGINT  'Unique customer identifier'
metering_id BIGINT  'Unique metering point identifier'
agreement_type VARCHAR(16) 'Agreement type – one of the following: GRID, NAMED_SUPPLIER, PORTFOLIO, SUPPLY'
provider_id BIGINT  'Unique provider identifier'
portfolio_id BIGINT  'Unique portfolio identifier'
customer_id BIGINT  'Unique customer identifier'
first_date DATE  'Beginning date of the network agreement as YYYY-MM-DD (if not available, then 2011-01-01)'
last_date DATE  'End date (last day of validity) of the network agreement as YYYY-MM-DD, leave empty if not defined'

Table A1-2. Description of the table customers

Column name Type Comment
customer_id BIGINT  'Unique customer identifier'
eic VARCHAR(16) 'Client EIC code'
code VARCHAR(100) 'Code of the network agreement counterparty: 11-digit Estonian personal identification code or 8 digit Estonian commercial registry code'
registry VARCHAR(100) 'Registry ID type used 1.“isikukood“ – personal identification code, for private clients; 2.“äriregister“ – commercial registry code, for business clients; 3. „dok. number” – document number, for clients who are not Estonian citizens.'
registry_country VARCHAR(2) 'Country The country of the registry defined in the previous field 2 characters „EE“ – Estonian registries, „/Country Code/“ – if the person is a foreign national'
given_name VARCHAR(100) 'For private persons, given name of the network agreement counterparty; for legal persons, business name of the client'
sur_name VARCHAR(100) 'For private persons, surname of the network agreement counterparty; for legal persons, left empty'

Table A1-3. Description of the table metering_data

Column name Type Comment
metering_id BIGINT  'Unique metering point identifier'
timestamp   BIGINT  'Time of creation of the message YYYY-MM-DDTHH:MM:SS'
in_quantity BIGINT  'Quantity of electricity entering the network'
out_quantity BIGINT  'Quantity of electricity exiting the network'

Table A1-4. Description of the table metering_points

Column name Type Comment
metering_id BIGINT  'Unique metering point identifier'
provider_id BIGINT  'Unique service provider identifier'
eic VARCHAR(16) 'EIC code of the metering point'
metering_type VARCHAR(32) 'Metering type – one of the following: REMOTE_READING, VIRTUAL, SINGLE_TARIFF_MANUAL, DUAL_TARIFF_MANUAL'
small_consumer CHAR(1) 'Client type at metering point: small consumer - 1 or not - 0'
disconnected CHAR(1) 'Connection status at metering point: CONNECTED – connected, DISCONNECTED – disconnected'
street_address VARCHAR(200) 'Address – street address (small place, name of land unit, street, address number, number of apartment or other part of building) Postcode – postal code. If the address has not been assigned a postal code, enter postal code as 00000'
locality VARCHAR(200)  'Address – county, municipality (city, rural municipality), settlement unit (village, small town, town, city without municipal status) or city district'
borderpoint CHAR(1) 'Is a metering point between two network operators? yes – border metering point, no – standard metering point'

Making queries

A command for creating an external table in Hadoop Hive:

  CREATE EXTERNAL TABLE IF NOT EXISTS smart_meter.agreements (

agreement_id BIGINT COMMENT 'Unique customer identifier',  metering_id BIGINT COMMENT  'Unique metering point identifier', agreement_type VARCHAR(16) COMMENT 'Agreement type – one of the following: GRID, NAMED_SUPPLIER, PORTFOLIO, SUPPLY', provider_id BIGINT COMMENT 'Unique provider identifier', portfolio_id BIGINT COMMENT 'Unique portfolio identifier', customer_id BIGINT COMMENT 'Unique customer identifier', first_date DATE COMMENT 'Beginning date of the network agreement as YYYY-MM-DD (if not available, then 2011-01-01)', last_date DATE COMMENT 'End date (last day of validity) of the network agreement as YYYY-MM-DD, leave empty if not defined') ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' with serdeproperties ("escapeChar" = "#") STORED AS TEXTFILE LOCATION '/datasets/smart_meter/agreements';

As the command uses OpenCSVSerde for manipulating data in unicode the field types of the table are changed to a type STRING and we need to change the types once again when we create a table in the Optimized Row Columnar (ORC) file format.

  CREATE TABLE IF NOT EXISTS smart_meter.agreements_orc

STORED AS ORC tblproperties ("orc.compress"="NONE") AS SELECT CAST(agreement_id AS BIGINT), CAST(metering_id AS BIGINT), CAST(agreement_type AS VARCHAR(16)), CAST(provider_id AS BIGINT), CAST(portfolio_id AS BIGINT), CAST(customer_id AS BIGINT), CAST(first_date AS DATE), CAST(last_date AS DATE) FROM smart_meter.agreements;

The distribution of consumers was found by a following sql statement which joins different tables:

  SELECT customers.registry, metering_points.metering_type, count(*)

FROM agreements   JOIN customers ON customers.customer_id = agreements.customer_id   JOIN metering_points ON metering_points.metering_id = agreements.metering_id WHERE agreements.last_date IS NULL AND agreements.agreement_type = 'GRID' GROUP BY customers.registry, metering_points.metering_type ORDER BY customers.registry;

A2: Creating a visualisation of daily energy consumption with R with smart meter data as input

Figure A2.1 shows the average consumption over several households per weekday. The R-Code provided below creates a synthetic test data set. The aggregation is done with the R-package data.table and the R-package ggplot2 to generate a line chart for each weekday.

Figure A2.1

# Author: Alexander Kowarik

# Generation of a daily consumption plot from smart meter data

# (and generating very basic synthetic data to test the function)


#Generate a plausible consumption value for each hour


m <- rep(NA,24)

m[1:6] <- seq(.55,.78,length.out=6)

m[7:8] <- c(.76,.75)

m[9:20] <- seq(.74,.95,length.out=12)

m[20:24] <- seq(.95,.58,length.out=5)

#Generate daily curves for the 7 weekdays

dayCurves1 <- list(Monday=m,Tuesday=m*1.02,Wednesday=m*1.01,Thursday=m,Friday=m*.99,Saturday=m*.85,Sunday=m*.9)

c0 <- function(x){

  out <- as.character(x)

  TF <- x<10

  out[TF] <- paste0("0",x[TF])



simulateHH <- function(dayCurves=dayCurves1,intervalMin=30,startDay="2016-06-01",endDay="2016-06-30"){

  dayCurves <- lapply(dayCurves,function(x)log(x))         

  #Number of measures per hour

  intervals <- 60/intervalMin

  #Weekdays of the days

  wd <- weekdays(as_date(ymd(startDay):ymd(endDay)))

  twd <- table(wd)

  #Initialized consumption vector

  cons <- rep(NA,24*intervals*length(wd))

  #Init timestamp

  ts <- as_datetime(paste(rep(as_date(ymd(startDay):ymd(endDay)),each=24*intervals),paste0(rep(c0(0:23),each=intervals),":",rep(c0((0:(intervals-1))*intervalMin),24),":00")),tz="Europe/Vienna")

  for(d in names(twd)){

    mInd <- matrix(1:(twd[d]*intervals),nrow = intervals)

    # Simulate consumption data for a all values needed for a specific hour of a weekday

    simCon <- sapply(dayCurves[[d]],function(x)rlnorm(twd[d]*intervals,meanlog = x, sdlog=.05))

    simCon <- as.vector(apply(mInd,2,function(x)as.vector(simCon[x,])))

    cons[weekdays(ts)==d] <- simCon




#Function to generate multiple HH

simulateDat <- function(nHH=100,dayCurves=dayCurves1,intervalMin=30,startDay="2016-06-01",endDay="2016-06-30"){

  timePoints <- 60/intervalMin*24*length(ymd(startDay):ymd(endDay))

  dat <- data.table(hhid=rep(1:nHH,each=timePoints),cons=as.numeric(rep(NA,timePoints*nHH)),ts=as_date(rep(NA,timePoints*nHH)))

  dat[,c("cons","ts"):=simulateHH(dayCurves = dayCurves,intervalMin=intervalMin,startDay=startDay,endDay=endDay),by=hhid]



hh <- simulateDat(100)

# Compute the average of the consumption for each time interval per weekday

hh2 <- hh[,.(avgCons=mean(cons)),by=.(hour(ts)+minute(ts)/60,weekdays(ts))]

# Construct a factor with the weekdays to fix the ordering


# With ggplot2 the plot is quite straightforward

# y-Value is the averaged consumption

# x-Value the time interval

# facet_grid splits the plot by weekday

# the color of the line is set to blue

# the theme is adjusted, so there is no gray background

ggplot(data=hh2,aes(y=avgCons,x=hour))+ geom_line(color="blue") + facet_grid(.~weekday) + theme_bw()

SQL / R example setup for Denmark

Below is the actual data definition and some R code that does processing. The main table is all large char variables. This was nessecary for a quick digestion into Oracle. Data is later processed and converted to appropriate types in R.

Reading table DDL


































Agreement tabel DDL

<code> CREATE TABLE "D900002"."METER_MP2" 

































Below is the index

















R code for quick cleanse of raw data

Below is some example code in R that shows how the raw data could be processed.

###### # SmartMeter Quick Read ###### source("F:/workspace/OGD/KT22SVN/Fælles programmer/R_scripts/OraGenvej.R") library(stringr) library(reshape2) library(ggplot2) library(data.table) library(dplyr) library(tidyr) library(xtable) # library(hash) conn <- OraGenvej("DB_PSD") ## Connect to Oracle readings <- data.table(dbReadTable(conn = conn, schema="D900002", name="METER_READINGS_RAW")) ## OGD_METER er bare et view der peger på METER_RAW, med en del færre kolonner meters <- data.table(dbReadTable(conn = conn, schema="OGD", name="OGD_METER")) ## Fixing dates meters$VALID_FROM_DATE <- as.Date(strptime(meters$VALID_FROM_DATE, format="%Y.%m.%d %H:%M:%S")) meters$VALID_TO_DATE <- as.Date(strptime(meters$VALID_TO_DATE, format="%Y.%m.%d %H:%M:%S")) ## Find the newest version of each record meters$VALID_TO_DATE <- as.Date(ifelse(is.na(meters$VALID_TO_DATE), Sys.Date(),meters$VALID_TO_DATE), origin="1970-01-01") setorder(meters,MPO_METERING_POINT_ID, -VALID_TO_DATE,na.last=F) meters <- meters[order(meters$MPO_METERING_POINT_ID, meters$VALID_TO_DATE),,] meters <- meters[!is.na(MPO_METERING_POINT_ID)] meters <- meters[!duplicated(MPO_METERING_POINT_ID)] ## Convert amounts and dates to numeric readings$AMOUNT <- gsub("\\.","",readings$AMOUNT) readings$AMOUNT <- gsub(",",".",readings$AMOUNT) readings$AMOUNT <- as.numeric(readings$AMOUNT) readings$DATE <- as.Date(strptime(readings$READ_TIME, format="%Y.%m.%d %H:%M:%S")) ## Size estimates print(object.size(readings), units="GB") ## Ca. 0.6 GB print(object.size(meters), units="GB") ## Ca. 4.7 GB - med de hele ## Check meter reading dates ReadDatesFreq <- readings[,.(antal = .N), c("DATE")] setorder(ReadDatesFreq, -antal) ## Check meter frequency - are some meters read frequently? MeterFreq <- readings[,.(antal = .N), by = METERING_POINT] MeterFreqFreq <- MeterFreq[,.(antal = .N), by = antal] typer <- c('Y','HY',NA,'Q',NA,NA,NA,NA,NA,NA,NA,'M') MeterFreq$type <- typer[MeterFreq$antal] hist(MeterFreq$antal) ## Classify meters according to frequency MeterKlassifikation <- MeterFreq[,.(antal = .N), by = type] andelKlassificerede <- MeterKlassifikation[is.na(type)]$antal / sum(MeterKlassifikation$antal) ## The share of known company or personal ID numbers MissingCVRandCPR <- meters[,.(missingCPR = sum(is.na(CONSUMER_CPR)), missingCVR = sum(is.na(CONSUMER_CVR))),] MissingCVRandCPR <- MissingCVRandCPR / nrow(meters) ## Missing Address information MissingPostCode <- meters[,.(missingStreet = sum(is.na(STREET_CODE)), missingPostcode = sum(is.na(POSTCODE)), missingBuilding= sum(is.na(BUILDING_NUMBER))),] MissingPostCode <- MissingPostCode / nrow(meters) ## Overlap of IDs between datasets antalMPO <- length(unique(meters$MPO_METERING_POINT_ID)) y <- sum(unique(meters$MPO_METERING_POINT_ID) %in% unique(readings$METERING_POINT)) andel <- y/antalMPO ## Quick summary of consumptions on postcodes wattPrPost <- merge(readings, meters[,c("MPO_METERING_POINT_ID","POSTCODE"),with=F], by.y="MPO_METERING_POINT_ID", by.x="METERING_POINT")[,.(total = sum(AMOUNT)), by = POSTCODE] setorder(wattPrPost, -total)