Second virtual sprint: Germany
Content
2. Scraped data from the job board stepstone.de. 1
3. Aggregated results from the German Job Vacancy Survey by 11 NACE groups. 2
4. Comparison stepstone.de versus job vacancy statistics. 4
Figures
Figure 1: Overview of 20 industry sectors, available by using stepstone search filter 2
Figure 2: Aggregated results from the German Vacancy Survey. 3
Figure 3: Comparison Stepstone-Classification with NACE. 4
Contents
1. Introduction
The objective of this second virtual sprint was to investigate the structure of job advertisements found in a major job board with aggregated results of the German Job Vacancy Survey. As the job board stepstone.de is the biggest job board in Germany, web scraped data from this board were chosen. This board has furthermore the specificity that it includes information on the economic activity of the enterprise, which, at least at first sight, is similar to the definition used in official statistics. For this reason it is a suitable basis for an exploratory analysis of the structural differences between job advertisements on a major job board with the JVS results.
The participants were: Martina Rengers, Thomas Körner (30 September only)
2.Scraped data from the job board stepstone.de
Data from stepstone were scraped at two different times. As a technical basis, the free software Selenium, Firefox and a custom Java program were used. The custom Java program controls Selenium which controls the Firefox browser. Based on the specified criteria, the custom Java program searched for jobs and collects the relevant data found. After the data retrieval, the custom Java program generates the results in the Microsoft Excel (xlsx) format. Once in the beginning of September and another time one month later to see how stable the results are. The scraping was done by using the search filter “sectors” which is available on stepstone.de. Figure 1 shows the 20 different sectors stepstone offers to the user as well as the number of jobs found in each sector by web scraping. Additionally the scarping was also done without using any search filters at all. It was surprising that there was –at both scraping times– a difference between the sum of job advertisements found by all sectors and the number of job advertisements which was found without any specification. Between eight and nine percent of job advertisements were lost when the filters were applied.
1: Overview of 20 industry sectors, available by using stepstone search filter
No. |
Stepstone sectors |
Scraping 07.09.2016 |
Scraping 04.10.2016 |
1 |
IT & Internet |
7,211 |
7,127 |
2 |
Other Business Activities & Services |
5,736 |
5,774 |
3 |
Other Sectors and Industries |
4,810 |
5,027 |
4 |
Manufacture of Transport Equipment |
4,860 |
4,736 |
5 |
Wholesale, Retail Trade |
4,045 |
4,102 |
6 |
Legal, Consultancy & Auditing |
3,923 |
4,000 |
7 |
Manufacture of electrical and optical equipment |
3,793 |
3,883 |
8 |
Manufacture of machinery and equipment |
3,327 |
3,195 |
9 |
HR Services, Recruitment & Selection |
2,538 |
2,382 |
10 |
Medical, Health & Social Care |
2,128 |
2,044 |
11 |
Distribution, Transport & Logistics |
1,973 |
2,009 |
12 |
Building & Construction |
1,689 |
1,721 |
13 |
Banking |
1,498 |
1,400 |
14 |
Medical Technology |
1,182 |
1,192 |
15 |
Other Industry |
1,131 |
1,178 |
16 |
Financial Services |
1,183 |
1,176 |
17 |
Energy and water supply and waste management |
1,164 |
1,157 |
18 |
Insurances |
1,128 |
1,159 |
19 |
Fast Moving Consumer Goods/ Durables |
1,140 |
1,107 |
20 |
Publishing, Printing & Reproduction |
1,122 |
1,111 |
|
Sum of sectors |
55,581 |
55,480 |
|
Search result without specification |
61,565 |
60,320 |
|
Difference |
5,984 |
4,840 |
3. Aggregated results from the German Job Vacancy Survey by 11 NACE groups
The German Job Vacancy Survey is carried out by the Institute for Employment Research (IAB – Institut für Arbeitsmarkt- und Berufsforschung). The IAB, which is based in Nuremberg, was set up in 1967 as a research unit of the former Federal Employment Service (Bundesanstalt für Arbeit) and has been a special office of the Federal Employment Agency (Bundesagentur für Arbeit/BA) since 2004.
The IAB Job Vacancy Survey is a representative survey including all economic sectors and establishment sizes in Western and Eastern Germany. The regular surveys of a representative selection of establishments and public institutions are geared towards personnel representatives and/or business managers with personnel responsibility. The survey started with a written questionnaire in Western Germany in 1989 and has since been repeated every year – always in the fourth quarter – as cross-sectional survey; since 1992 also in Eastern Germany. Since 2006, in connection with EU regulation (EC) No. 453/2008 (that made the quarterly collection of job vacancy data mandatory from 2010 onwards) the written questionnaires of the IAB Job Vacancy Survey in the fourth quarter have been supplemented by short telephone interviews in the following, second and third quarter that capture the intra-annual changes. The survey is carried out by Economix Research & Consulting, located in Munich.
Under the web addresses
- http://fdz.iab.de/de/FDZ_Establishment_Data/IAB_Job_Vacancy_Survey/IAB_Job_Vacancy_Survey_Outline.aspx (in German language)
http://fdz.iab.de/en/FDZ_Establishment_Data/IAB_Job_Vacancy_Survey/IAB_Job_Vacancy_Survey_Outline.aspx (in English language)
and - http://www.iab.de/de/befragungen/stellenangebot.aspx (in German language)
http://www.iab.de/en/befragungen/stellenangebot.aspx (in English language)
the IAB provides an overview on the survey, data access via on-site use at the Research Data Centre (FDZ) of the IAB and subsequently remote data access and furthermore
– under the second URL – a list of publications and aggregated data results (the latter is unfortunately only available on the German language webpage).
Figure 2 shows the latest results available on the above mentioned website. For the first quarter of 2016 the survey results came up with 989,000 job vacancies. They are separated into eleven groups of NACE sections.
2: Aggregated results from the German Vacancy Survey
Source: Download from http://www.iab.de/de/befragungen/stellenangebot/aktuelle-ergebnisse.aspx
4.Comparison of stepstone.de versus job vacancy statistics
For a comparison of both data sources it is necessary to match the twenty sections stepstone displays on its website with the eleven groups of NACE sections known from aggregated results of the German Job Vacancy Survey. Figure 3 shows the results of this matching process. It comes clear that a one-to-one matching of stepstone sectors with the NACE-classification used by the job vacancy survey in some cases is impossible and in other cases is only possible when making same assumptions. The last column of Figure 3 therefore contains same remarks of the classification experts at the Federal Statistical Office who were asked for support. The section “Other Sectors and Industries” of stepstone for example is not possible to match. Assumptions made for stepstone sectors “IT & Internet”, “Medical, Health & Social Care” and “Insurances” can lead to an over-representation of NACE groups like J or I,P,Q,R,S or K by stepstone.
3: Comparison stepstone classification with NACE
Nr. |
Stepstone-sectors |
NACE sections of JVS |
Remarks of classification experts |
1 |
IT & Internet |
J |
NACE section J doesn’t include manufacturing of IT hardware, but maybe stepstone does. As a result section J could be overrepresented by stepstone. |
2 |
Other Business Activities & Services |
I, P, Q, R, S |
This is only a rough matching. |
3 |
Other Sectors and Industries |
??? |
A matching is not possible. |
4 |
Manufacture of Transport Equipment |
C |
|
5 |
Wholesale, Retail Trade |
G |
|
6 |
Legal, Consultancy & Auditing |
L, M, N |
|
7 |
Manufacture of electrical and optical equipment |
C |
|
8 |
Manufacture of machinery and equipment |
C |
|
9 |
HR Services, Recruitment & Selection |
L, M, N |
|
10 |
Medical, Health & Social Care |
I, P, Q, R, S |
NACE sections I, P, Q, R, S don’t include social security, but maybe stepstone include this here. As a result sections I, P, Q, R, S could be overrepresented by stepstone. |
11 |
Distribution, Transport & Logistics |
H |
|
12 |
Building & Construction |
F |
|
13 |
Banking |
K |
|
14 |
Medical Technology |
C |
This matching is done under the assumption that production is meant here. |
15 |
Other Industry |
C |
This matching is not really clear. It’s also possible to match it with NACE sections B, D, E. |
16 |
Financial Services |
K |
|
17 |
Energy and water supply and waste management |
B, D, E |
|
18 |
Insurances |
K |
NACE section K doesn’t include social security, but maybe stepstone include this here. As a result section K could be overrepresented by stepstone. |
19 |
Fast Moving Consumer Goods/ Durables |
C |
This matching is done under the assumption that production is meant here. |
20 |
Publishing, Printing & Reproduction |
J |
|
After doing the matching in the way Figure 3 describes, a comparison of the structure of job advertisements by economic activity sector can be done. The results of this analysis are shown in Figure 4.
The two biggest groups of NACE sectors within the 989,000 job vacancies that can be found in the JVS are “other services” (NACE sections I, P, Q, R, S) and “business services” (NACE sections L, M, N) with 27.5% and 26.4% respectively. The second place is also shared: “manufacturing” (NACE section C) with 10.9% and “wholesale and retail trade; repair of motor vehicles and motorcycles” (NACE section G) with 10.8%. In contrast, the largest proportion of the about 55,500 job advertisements found at stepstone (some 28%) are allocated to “manufacturing” (NACE section C), followed by “information and communication” (NACE sector J) with some 15%. Compared to the results from the Job Vacancy Survey these groups are overrepresented by stepstone. To a certain degree this could be caused by the vague matching procedure but it supports the assumption that at least “information and communication” is overrepresented in online job boards.
Figure 4: Structure of economic activity sectors represented by stepstone compared to Job vacancy survey
* Search result without using search filters
® 76_18_ \0_Meetings\virtual-sprint_2\Branchenvergleich2.xlsx
5.Results
The main results of the second virtual sprint can be summarized as follows: there are more problems than expected.
- First of all, there is a problem with the quality of the web scarped data from the job board stepstone. For unexplained reasons the sum of results from twenty economic activity sectors available by using special job search filter is not the same as the number of search results getting without any specification. This means that about 5-6 thousand job ads are lost by using sector search filters.
- Not mentioned before but regarding the quality of the scraped data it is worth to give some additional remarks: there were two different kind of duplicates found in the scraped data that is within and between the same economic sector. The first kind of duplicate is maybe due to technical reasons but the second is more confusing especially in the sense that after de-duplication the difference to scarping results without using any search filters is further increasing.
- A second problem is in the terms stepstone used for sectors. The classification of economic activities stepstone used, cannot matched one-to-one with the NACE-classification used by the job vacancy survey. Besides that a modification of the stepstone sector names over the time was found. Using the classification used by stepstone for statistical purposes is not an option for this reason. It remains to be investigated whether information on the economic activity can be obtained via a matching the business register or by means of a textual analysis of the full job descriptions found at stepstone.
- Regardless of all this problems the expectation, especially the sector “information and communication” probably is overrepresented in online job boards, couldn’t be disproved. Instead the thesis of this kind of overestimation was rather reinforced by the scraped data from stepstone.