scispace - formally typeset
Open AccessBookDOI

Earnings and employment microdata in South Africa

Reads0
Chats0
TLDR
In this article, the authors compare household surveys, firm surveys, and administrative data from the South African Revenue Service (SARS), and assess the strengths and weaknesses of each source and its strengths and weakness.
Abstract
Traditionally, analysts of the South African labour market have used household survey data to describe earnings and employment in the post-Apartheid period. More recently, administrative data from the South African Revenue Service has been made available, which allows for comparisons and an assessment of each source and its strengths and weaknesses. There are a number of sources of data, including household surveys, firm surveys, and administrative data, and it can be hard to keep up with all of them.

read more

Content maybe subject to copyright    Report

WIDER Working Paper 2019/47
Earnings and employment microdata in South
Africa
Andrew Kerr and Martin Wittenberg*
May 2019

* Both authors: DataFirst, University of Cape Town (UCT), Cape Town, South Africa; corresponding author:
andrew.kerr@uct.ac.za.
This study has been prepared as part of the projectSouthern Africa—Towards Inclusive Economic Development (SA-TIED)’.
Copyright © UNU-WIDER 2019
Information and requests: publications@wider.unu.edu
ISSN 1798-7237 ISBN 978-92-9256-681-4 https://doi.org/10.35188/UNU-WIDER/2019/681-4
Typescript prepared by Luke Finley.
The United Nations University World Institute for Development Economics Research provides economic analysis and policy
advice with the aim of promoting sustainable and equitable development. The Institute began operations in 1985 in Helsinki,
Finland, as the first research and training centre of the United Nations University. Today it is a unique blend of think tank, research
institute, and UN agencyproviding a range of services from policy advice to governments as well as freely available original
research.
The Institute is funded through income from an endowment fund with additional contributions to its work programme from
Finland, Sweden, and the United Kingdom as well as earmarked contributions for specific projects from a variety of donors.
Katajanokanlaituri 6 B, 00160 Helsinki, Finland
The views expressed in this paper are those of the author(s), and do not necessarily reflect the views of the Institute or the United
Nations University, nor the programme/project donors.
Abstract: Traditionally, analysts of the South African labour market have used household survey
data to describe earnings and employment in the post-Apartheid period. More recently,
administrative data from the South African Revenue Service has been made available, which allows
for comparisons and an assessment of each source and its strengths and weaknesses. There are a
number of sources of data, including household surveys, firm surveys, and administrative data, and
it can be hard to keep up with all of them. In this paper we thus provide a summary of the main
sources of data on earnings and employment and their strengths and weaknesses, to aid researchers
and policymakers who wish to make use of these data in their own analysis.
Keywords: administrative data, data sources, earnings, employment, surveys, South Africa,
JEL classification: J21, J31
Acknowledgements: This paper has been supported as part of the UNU-WIDER ‘Southern
AfricaTowards Inclusive Economic Development (SA-TIED) research programme. The
programme also supported a separate paper on the South African Revenue Service IRP5 tax admin
data, which Section 4.1 below relies on. We would also like to acknowledge the support of the
University of Cape Town Vice Chancellor’s strategic fund (201112), the International Labour
Organization (2013), and the Research Project on Employment, Income Distribution and
Inclusive Growth, a programme supported by the National Treasury (201317). We thank Bruce
McDougall for sharing his work on National Income Dynamics Study and Quarterly Labour Force
Survey earnings comparisons, which was the topic of his UCT masters thesis, supervised by
Martin Wittenberg.

1
1 Introduction
Traditionally, analysts of the South African labour market have used household survey data to
describe earnings and employment in the post-Apartheid period. More recently, administrative
data from the South African Revenue Service (SARS) has been made available, which allows for
comparisons and an assessment of each source and its strengths and weaknesses. There are a
number of sources of data, including household surveys, firm surveys, and administrative data, and
it can be hard to keep up with all of them. In this paper we thus provide a summary of the main
sources of data on earnings and employment and their strengths and weaknesses, to aid researchers
and policymakers who wish to make use of these data in their own analysis.
The household survey data sources to be described and analysed include the Quarterly Labour
Force Survey (QLFS) and the older Labour Force Surveys (LFS) and October Household Surveys
(OHS), all conducted by Statistics South Africa (Stats SA), starting in 1994 and ending with the
most recent QLFS. They also include the Post-Apartheid Labour Market Series (PALMS; Kerr et
al. 2019), available through DataFirst, which is a harmonized version of all the Stats SA QLFSs,
LFSs, and OHSs, as well as the 1993 Project for Statistics on Living Standards and Developments
(PSLSD), conducted by the Southern Africa Labour and Development Research Unit (SALDRU).
The other Stats SA household survey data source which we describe is the General Household
Survey. Although it lacks the detailed questions about each individual’s employment that are found
in the QLFS, it does have a roughly consistent question on earnings and employment going back
to 2002. Given some of the issues with the QLFS, discussed below, this alternative source of
employment and earnings data also needs to be assessed. Finally, the National Income Dynamics
Study (NIDS), undertaken by SALDRU, is a source of earnings and employment data not
produced by Stats SA, which is a useful check on the other surveys, and we discuss it below.
The data we have mentioned thus far are all publicly available. We also describe two sources of
data either that are not in the public domain or to which there is some limited access. The first of
these is the SARS IRP5 data setwhich contains the tax records of all employees of tax-registered
companies who earned more than R2,000 in each tax year between 2011 and 2016 (newer data
may be made available). This is currently available to researchers approved by the National
Treasury. The second data source is the Stats SA Quarterly Employment Statistics (QES) survey.
This has not been made available to researchers, although Kerr et al. (2014) used the data to explore
job creation and destruction, and in the process described the employment data (but did not
analyse earnings). We also briefly mention a third source of microdata which has not been used in
any research that we are aware ofthe firm-level data from the Unemployment Insurance Fund
(UIF) submissions to the Department of Labour. Such data has often been used in other countries
in labour market analysis, and so we note that it may be a useful source.
2 Household Survey Data
2.1 OHS, LFS, and QLFS
The household surveys from Stats SA are the starting point for any analysis of earnings and
employment in South Africa. The October Household Surveys began with OHS 1993 and
continued until OHS 1999. The Labour Force Surveys replaced the OHS labour market data
collection and were run biannually in March and September until September 2007. The Quarterly

2
Labour Force Surveys were then introduced in February 2008, and they continue to be undertaken
every quarter. Any analysis of these three sets of household survey data we undertake below is
carried out on PALMS version 3.3 (Kerr et al. 2019). PALMS is a compilation of all the OHSs,
LFSs, and QLFSs, as well as SALDRU’s PSLSD conducted in 1993, and several versions have
been released by DataFirst since 2013. The most recent version (PALMS v3.3) contains QLFS Q2
2018, but earnings data only up until 2017.
1
We begin by briefly reviewing known issues relating to employment and earnings in these three
sets of surveys, then provide some descriptive analysis showing these issues before discussing other
household surveys that can be used to investigate earnings and employment in South Africa.
2.1.1 October Household Surveys
The OHSs were the first attempt by Stats SA (which changed its name in 1997 from the Central
Statistical Service) to collect nationally representative household survey data and to release them
publicly for analysis.
The first OHS was conducted in 1993 but was not nationally representative, since it did not cover
some homeland areas (Wittenberg 2008). This survey has thus been mostly overlooked by
researchers, and not much is known about the strengths and weaknesses of the data other than
that they exclude homelands, which were and are areas with low earnings and low employment.
This renders it not very useful for any analysis of earnings and employment.
OHS 1994 was the first nationally representative household survey of the post-Apartheid period.
OHSs 1994, 1995, and 1996 have the weakness that the sample frame of enumeration areas (EAs)
for the first stage of the two-stage cluster sample came from the 1991 census. The 1991 census
did not cover homelands, but these areas were included in the sample frame (Central Statistical
Service 1998). Thus it is possible that despite covering all of South Africa, the sample frame was
not as reliable as later ones, which used censuses that covered the entire country.
In OHS 1994, analysts have noted other issues that also relate to how the sample frame was
constructed and whether that meant it was not truly nationally representative. These include too
many whites relative to their share in the population, too few domestic workers, and too much
employment (Branson and Wittenberg 2007).
OHS 1995 has been the basis of much work that has sought to describe changes over time in a
number of employment-related issues (Branson and Wittenberg 2007). Wittenberg (2014b) has
noted several issues in OHS 1995employment was too high, unemployment too low, and the
earnings gap between men and women too low. Branson and Wittenberg (2007) argue for the use
of all possible sources of data, rather than just one at the start and one at the end of any period
under investigation.
Stats SA faced budget constraints in conducting OHS 1996, so the sample size was smaller. In
addition, in both 1996 and 1997 political violence meant that residents in hostels in KwaZulu-
Natal and Gauteng were not enumerated (Kerr and Wittenberg 2015). Since hostels have a large
share of individuals in mining employment, mining employment was estimated to be much lower
than it actually was in those two years.
1
This is being released in June 2019.

3
In OHS 1999, a master sample of EAs was used based on the 1996 population census (Statistics
South Africa 2000). This was thus the first time that a sample of EAs was drawn using one
nationally representative source of EA datathe 1996 population census. The master sample was
so-named because the same sample of EAs was to be used to sample households in a number of
surveys for several years. Thus OHS 1999, all the LFSs until 2004, the 2001 Income and
Expenditure Survey, and the General Household Survey (GHS) 2002–04 were conducted by
sampling households from the same set of EAs.
Casale et al. (2004) note that in OHSs 1997 and 1999 the questionnaires included as examples of
employment those involved in subsistence agriculture, but that earlier surveys and OHS 1998 did
not. They also document that despite the prompt, almost no individuals were recorded as
employed in subsistence agriculture in OHS 1997. We discuss employment in subsistence
agriculture in further detail below.
In OHS 1999, Stats SA also changed the method of sampling to what are called multiple household
dwelling points (Kerr and Wittenberg 2015). These are places where the listing of the enumerator
area/cluster suggested that there was only one household, but there was actually more than one.
The common example is a backyard shack which was not noticed by the enumerator when the
listing was undertaken. In OHS 1998 and earlier, only one household was enumerated, reducing
the probability of selection of particularly small households. Kerr and Wittenberg (2015) show that
this meant that small households were under-represented in the weighted data and suggest that
one outcome of this was an undercount of employment in 1998 and earlier, since those in the
small households that were missed had better employment outcomes than those in larger
households.
2.1.2 Labour Force Surveys
The LFS collected detailed information on employment and earnings for all employed individuals.
It continued to use the master sample that was first used in OHS 1999 to sample households. But
the LFS also incorporated a rotating panel design, meaning that a panel of dwellings was created
(Statistics South Africa 2001). Every time a new LFS was undertaken, 80 per cent of the dwelling
units in the sample were reinterviewed, while 20 per cent were rotated out and replaced with a new
sample of dwelling units from the same Primary Sampling Unit (PSU). This rotation was supposed
to begin in the third round of the LFS (2001: March). However, the LFS 2000: September had the
same sample as the Income and Expenditure Survey of 2000, which led to larger non-response
rates, and thus the first wave of the panel was 2001: September. It ran until LFS 2004: March. LFS
2004: September was then the first survey to use a new master sample that was based on the 2001
census list of EAs. This master sample was used until 2007.
The first three LFSs (February 2000, September 2000, and March 2001) all had substantial
problems that meant total employment was overestimated in these three waves. This point is
important to take note of for any longer-run analysis of employment trends. The large increases
actually began in the last OHS in 1999. This was partly because OHS 1999 improved small-
household coverage and thus found more employment, since small households have higher
proportions of employed individuals (Kerr and Wittenberg 2015). But in the first three LFSs,
measured employment is simply too high. We make this conclusion based on Figure 1, which
shows that the first two LFSs measured substantial amounts of subsistence agriculture that was
not replicated in any subsequent survey.
The third LFS (March 2001) was linked to the first Survey of Employers and Self-Employed
(SESE). As described by Kerr and Wittenberg (2015) and in more detail by Kerr (2015a), the
enumerators for the March 2001 LFS were required to reinterview the owners of any non-VAT-

Citations
More filters
Journal ArticleDOI

The Post-Apartheid Labour Market Series: Social and Behavioural Sciences

TL;DR: The post-Apartheid labour market series (palms) as discussed by the authors is a compilation of microdata from 69 household surveys conducted in South Africa, which has generated new knowledge about the South African labour market.
BookDOI

Measuring labour earnings inequality in post-apartheid South Africa

TL;DR: In this paper, the validity of household survey data published by Statistics South Africa since 1993 and later integrated into the Post-Apartheid Labour Market Series (PALMS) was investigated.
Journal ArticleDOI

The labor market in South Africa, 2000–2017

TL;DR: The South African economy was on a positive growth trajectory from 2003 to 2008 but, like other economies around the world, it was not spared from the effects of the 2008 global financial crisis.
BookDOI

Separating employment effects into job destruction and job creation: Evidence from a large minimum wage increase in the agricultural sector using administrative tax data

TL;DR: In this paper, the authors present new evidence on the employment effects of a large increase in agricultural minimum wages in South Africa using anonymized tax data and show that employment decreased by approximately 14 percentage points following the minimum wage increase.
BookDOI

Earnings in the South African Revenue Service IRP5 data

TL;DR: In this article, the IRP5 and IT3(a) tax data from the South African Revenue Service have been made available to researchers through a joint project between the SRS, the National Treasury, and UNU-WIDER.
References
More filters
Journal ArticleDOI

Job Flows, Worker Flows and Churning in South Africa

TL;DR: In this article, worker and job flows are estimated using anonymised IRP5 tax certificate data from the South African Revenue Service from the 2011-2014 tax years and contains information on more than 12 million individuals and nearly 300,000 firms.
Journal ArticleDOI

Reweighting South African National Household Survey Data to Create a Consistent Series Over Time: A Cross‐Entropy Estimation Approach

TL;DR: In this paper, the authors used a new set of weights calibrated to the ASSA 2003 model totals using a cross entropy estimation approach, which matched to a series of age-sex-race and province marginal totals that are consistent over time.
Book ChapterDOI

Data issues in South Africa

Related Papers (5)
Frequently Asked Questions (13)
Q1. What are the contributions in "Wider working paper 2019/47: earnings and employment microdata in south africa" ?

This paper provide a summary of the main sources of data on earnings and employment and their strengths and weaknesses, to aid researchers and policymakers who wish to make use of these data in their own analysis. 

Unfortunately, it does not appear likely that this data source will be made more widely available to researchers and policymakers, after the initial work by Kerr et al. ( 2014 ) documenting the data quality and estimating the extent of job and worker flows. This suggests that admin data are not a panacea, but should rather be used in conjunction with household surveys and firm data to better understand the South African labour market. The authors have tried to document each, focusing on sources that allow researchers to put together a picture of the evolution of earnings and employment in the post-Apartheid period, but pointing out that none are perfect and some may be better than others. 

The recent QLFSs contain in the sample around 18,000 employed individuals, which is 0.1 per cent of the total employment estimated from this sample. 

The household survey data sources to be described and analysed include the Quarterly Labour Force Survey (QLFS) and the older Labour Force Surveys (LFS) and October Household Surveys (OHS), all conducted by Statistics South Africa (Stats SA), starting in 1994 and ending with the most recent QLFS. 

In the initial period the authors used four components to estimate informal employment—wage informal, self informal, unpaid family workers, and domestic work. 

Given the evidence that there is likely to be measurement error in the earnings data from household surveys, the SARS IRP5 data set is thus a potentially very valuable source of data on earnings, since it is believed that it is subject to much less measurement error. 

A recent World Bank report states that agricultural employment halved between 2005 and 2010 (World Bank 2018: 79), but most of this decline is due to the disappearance of the estimated 300,000 individuals working in subsistence agriculture in 2005. 

Kerr and Wittenberg (2015) show that this meant that small households were under-represented in the weighted data and suggest that one outcome of this was an undercount of employment in 1998 and earlier, since those in the small households that were missed had better employment outcomes than those in larger households. 

The second important issue with the QLFS earnings imputations is that the statistical uncertainty of the earnings numbers obtained from them will be biased downwards, because analysts are treating the data as if they were from actual responses, whereas in reality there is uncertainty about the true values, particularly for those with imputed earnings who refused to answer at all, which occurred from 2010 to 2012 Q2. 

Other employed individuals (employers, own-account workers, and persons helping unpaid in their household business) are considered to be in the informal sector if they ‘are not registered for either income tax or value-added tax’. 

Kerr (2018) did attempt to identify what were probable pension funds, but noted that there were around 500,000 pension income certificates that could not be identified. 

That the IRP5 data are a yearly rather than a point-in-time measure and that period of employment seems mis-measured also affect comparisons with household surveys on employment. 

Arrow (2018) incorrectly argues—based on declining realized sample sizes, declining average household sizes, and increases in one-person households—that in fact the unemployment rate increase he documents is understated because of too many one-person households in the samples realized from the 2015 master sample.