scispace - formally typeset
Open AccessBookDOI

The construction and interpretation of combined cross-section and time-series inequality datasets

TLDR
The authors developed techniques to deal with national and international comparability problems intrinsic to such datasets, allowing them to perform parametric non-linear estimation of Lorenz curves from grouped data, which in turn allows them to estimate the entire income distribution, computing alternative inequality indexes and poverty estimates.
Abstract
The inequality dataset compiled in the 1990s by the World Bank and extended by the UN has been both widely used and strongly criticized. The criticisms raise questions about conclusions drawn from secondary inequality datasets in general. We develop techniques to deal with national and international comparability problems intrinsic to such datasets. The result is a new dataset of consistent inequality series, allowing us to explore problems of measurement error. In addition, the new data allow us to perform parametric non-linear estimation of Lorenz curves from grouped data. This in turn allows us to estimate the entire income distribution, computing alternative inequality indexes and poverty estimates. Finally, we have used our broadly comparable dataset to examine international patterns of inequality and poverty.

read more

Content maybe subject to copyright    Report

IIDEinstitute for international and development economics
Stichting IIDE, Institute for International & Development Economics
Email: i4ide@intereconomics.com Website: www.i4ide.org
© IIDE, Joseph Francois and Hugo Rojas-Romagosa, 2007
DISCUSSION PAPER
The Construction and Interpretation of
Combined Cross-Section and Time-Series
Inequality Datasets
August 2007
IIDE discussion paper 200708-05
Joseph Francois
JOHANNES KELPER UNIVERSITY (LINZ)
AND CEPR
Hugo Rojas-Romagosa
CPB THE HAGUE
Joseph.francois@jku.at
H.Rojas-Romagosa@cpb.nl
JEL codes: D31, D80, O15
Key words: Income distribution datasets, inequality trends, Lorenz curve
estimation, poverty estimation
AUGUST 2008
IIDE
discussion paper:
200708-05

The Construction and Interpretation of Combined
Cross-Section and Time-Series Inequality Datasets
Joseph F. Francois
Johannes Kepler University (Linz) and CEPR
Hugo Rojas-Romagosa
CPB (the Hague)
August 2007
Abstract: The inequality dataset compiled in the 1990s by the World Bank and extended
by the UN has been both widely used and strongly criticized. The criticisms raise
questions about conclusions drawn from secondary inequality datasets in general. We
develop techniques to deal with national and international comparability problems
intrinsic to such datasets. The result is a new dataset of consistent inequality series,
allowing us to explore problems of measurement error. In addition, the new data allow
us to perform parametric non-linear estimation of Lorenz curves from grouped data.
This in turn allows us to estimate the entire income distribution, computing alternative
inequality indexes and poverty estimates. Finally, we have used our broadly comparable
dataset to examine international patterns of inequality and poverty.
Keywords: Income distribution datasets, inequality trends, Lorenz curve estimation,
poverty estimation
JEL codes: D31, C80, O15
We acknowledge support from the EU research and training network (RTN) Trade,
Industrialization, and Development, as well as research support from DFID and the
World Bank. All errors are of course our own.
Address correspondence to: J. Francois, Tinbergen Institute, Erasmus University
Rotterdam, Burg Oudlaan 50-H8-18, 3000DR Rotterdam, NETHERLANDS.
Email: francois@few.eur.nl.
Data are available at www.i4ide.org/francois/data.html.

1. Overview
There is a sizeable literature regarding the interaction between income inequality and other
economic variables, such as growth, poverty, trade and economic policy. Beginning with Kuznets
(1955), the theoretical work has steadily grown, and recently there has been a surge in the topic,
reflected in a new wave of publications (Atkinson, 1997). Yet, the study of income inequality has
been seriously limited by data constraints. The introduction of a cross-country inequality dataset
by the World Bank (Deininger and Squire, 1996) has complemented the recent literature and has
itself launched a series of influential econometric studies.
While at the core of most recent work in the area, the structure of the inequality dataset
compiled by Deininger and Squire, henceforth DS, recently has been criticized by Atkinson and
Brandolini (2001), henceforth AB. These criticisms also extend implicitly to the recent extension
of the DS data in the World Income Inequality Dataset (WIID). AB forcefully argue for the need
to assess the mechanical use of such "secondary" datasets and to deal more systematically with
the measurement problems involved. In this paper we do this, focusing on the empirical and
theoretical difficulties related to income inequality measurement, analyzing the characteristics of
secondary datasets, and developing a methodological approach for reducing the measurement
error problems common to inequality information.
Substantial difficulties arise in the empirical measurement of inequality. The most basic is the
lack of an institution and agreed procedures that can assure data quality and consistency. In other
words, there is no equivalent to the United Nations System of National Accounts, which
provides macroeconomic statistics that are constructed by national agencies and are reasonably
consistent over time and countries. In the absence of such an institution, some organizations
have constructed "secondary" datasets, of which the best known are DS, the World Income
Inequality Database (UNU/WIDER-UNDP, 2000) and the Luxembourg Income Study (LIS).
These datasets compile available national inequality statistics and perform quality assessments of
all the data observations. This has been an important first step towards the creation of
internationally comparable inequality time series. The Deininger and Squire dataset combines a
large number of inequality observations for the entire world, with each observation classified
following three quality criteria. More recently, the World Income Inequality Database (WIID)
has extended and updated the DS dataset, using similar quality criteria. (Throughout this paper,
we use the larger compilation of data provided by the WIID as our main inequality data source.)
Beyond quality criteria issues, there are additional problems that increase the measurement
error present in national series and in international inequality comparisons. In particular, national
inequality statistics generally include observations that differ on concepts measured (i.e.

expenditure, gross and net income), reference units (e.g. household, person, family) and/or
sources. Subsequently we refer to these three distinctive characteristics as the inequality data
definitions and we consider an inequality series to be consistent when these definitions are
comparable for all observations. Although some countries have relatively extensive and
consistent time series, the general rule is that inequality observations are sparse and differ on
definitions over time. Hence, to create relatively extensive inequality time series that can be used
in econometric studies, it is often necessary to assume the comparability of some of the
definitions to handle the problem of sparsity. Deininger and Squire have assumed that all
definitions are broadly comparable and used their "high quality" observations to construct the
most consistent inequality time series for each country. However, they caution about the
potential problems of this comparability assumption and as an alternative they advise the use of
dummy variables to adjust and account for different definitions. Using this approach, they
generated a single inequality series for a wide number of countries, which has since feature
prominently in subsequent empirical research. While convenient, these simplifying assumptions
(i.e. the complete comparability of definitions and sources), introduce false patterns and noise
into the data. Furthermore, as AB have stressed, the use of dummy variables is not an adequate
solution to this problem.
In this paper, we build on previous efforts to overcome known limitations with secondary
inequality datasets. In particular, we assemble a combined inequality dataset based on a
consistent grouping methodology of heterogeneous observations from existing secondary
datasets. This yields a new cross-section and time-series dataset that we use to examine
comparability problems, and to then revisit recent estimates of the relationships between income
distribution and other macroeconomic variables. Our approach yields six main inequality series
that can readily be used in empirical tests and within these series the implicit measurement error
has been reduced. We also explore conceptual issues of measurement. There are important
theoretical considerations with regard to inequality measurement. While there are several
indicators that measure inequality, there is no consensus in favor of any particular index.
1
We use
non-linear parametric estimation of Lorenz curves to approximate the entire income distribution,
and then use these estimates to calculate the Gini coefficient, four different Atkinson indexes,
and poverty rates.
We also use our broadly comparable dataset to examine international patterns of inequality
and poverty. A first conclusion is that between-country inequality variation is more significant
1
A comprehensive survey of the topic can be found in Cowell (2000).

than within-country. This suggests that country specific characteristics have a bigger role in
explaining inequality levels than time trends. However, we also find that within-country
inequality is still important and there are significant time trends in our series. Therefore, we reject
the "glacial change" hypothesis that inequality does not vary significantly over time. For the
specific case of OECD countries, we clearly detect a U-shape pattern that confirms the "U-turn"
hypothesis recently flagged by Atkinson (2003). For developing countries the cross-country
pattern is less clear, but it suggests a decrease in inequality for most of the analyzed period, with
a slight increase in the 1990s. Country-specific time trends are diverse and it is difficult to spot
precise trends. The choice of income concept, basic or extended series and the use of pool data
may produce different results. Nevertheless, this variety of choice emphasizes the richness of our
inequality dataset, which is not limited by a single series and provides wider information from
where to draw conclusions. With respect to poverty, we find a decline in the poverty ratios over
time in most of the countries covered by our sample. The only (though admittedly quite
significant) exception is the poverty experience in the African continent.
The paper is organized as follows. Section 2 explores difficulties involved in dealing with
inequality data. In the Section 3 we assess comparability criteria and discuss the resulting
assumptions needed in order to consistently group different definitions and sources. In Section 4
we estimate the Lorenz curves and Atkinson indexes from grouped income data, and also
poverty ratios. Working with the resulting dataset, in Section 5 we compare it with the DS and
WIID series, and also compare the results provided alternatively by the Gini and the Atkinson
indexes. In Section 6 we then explore how international and inter-temporal inequality has
changed over time within our dataset. We conclude in Section 7.
2. Problems when dealing with inequality data
We divide the tasks involved in building a cross-country inequality dataset into two main groups.
The first group includes data compilation and quality control. These issues are relatively well
addressed by existing datasets. The second group includes those issues that are not yet
convincingly tackled: the inter-temporal and international comparability and consistency of
inequality data.
2.1 Secondary datasets
A "secondary" dataset is a summary of national information that is drawn from household
income studies and micro-datasets produced by national surveys. The two most used datasets are
the Deininger and Squire (DS) and the World Income Inequality Database (WIID). The WIID

Citations
More filters

Economic growth and income inequality

TL;DR: This article investigated whether income inequality affects subsequent growth in a cross-country sample for 1965-90, using the models of Barro (1997), Bleaney and Nishiyama (2002) and Sachs and Warner (1997) with negative results.
Journal ArticleDOI

Has globalisation increased inequality

TL;DR: This paper found that the economic dimension of globalisation, and less robustly political integration, have exacerbated wage inequality in developed countries, while the impact of globalization on both income and earnings inequality in less-developed countries has been negligible.
Journal ArticleDOI

Inequality, poverty and the Kuznets curve in Spain, 1850-2000

TL;DR: For instance, this article found that between the mid 1950s and 1974, inequality contraction and absolute poverty eradication represented a major departure from Latin America's performance while matching the OECD's.
Journal ArticleDOI

The changing shape of global inequality 1820-2000 : exploring a new dataset

TL;DR: In this paper, a dataset for charting the development of global inequality between 1820 and 2000 is presented, based on a large variety of sources and methods for estimating gross household income inequality.
Journal ArticleDOI

An anthropometric history of the World, 1810-1980: did migration and globalization influence country trends?

TL;DR: Baten (1999, 2000) argued that the coefficient of variation of human stature is correlated with overall inequality in a society, and that it can be used as indicator, especially where income inequality measures are lacking.
References
More filters
Book ChapterDOI

Economic Growth and Income Inequality

TL;DR: The process of industrialization engenders increasing income inequality as the labor force shifts from low-income agriculture to the high income sectors as mentioned in this paper, and on more advanced levels of development inequality starts decreasing and industrialized countries are again characterized by low inequality due to the smaller weight of agriculture in production and income generation.
Journal ArticleDOI

On the Measurement of Inequality

TL;DR: In this paper, the problem of comparing two frequency distributions f(u) of an attribute y which for convenience I shall refer to as income is defined as a risk in the theory of decision-making under uncertainty.

Economic growth and income inequality

TL;DR: This article investigated whether income inequality affects subsequent growth in a cross-country sample for 1965-90, using the models of Barro (1997), Bleaney and Nishiyama (2002) and Sachs and Warner (1997) with negative results.
Posted Content

Growth is Good for the Poor

TL;DR: Dollar and Kraay as mentioned in this paper found that the share of income accruing to the bottom quintile does not vary systematically with the average income, and that when average incomes rise, the average incomes of the poorest fifth of society rise proportionately.
Journal ArticleDOI

Inequality and Growth in a Panel of Countries

TL;DR: In this paper, a broad panel of countries showed little overall relation between income inequality and rates of growth and investment, while the Kuznets curve is a clear empirical regularity, but it does not explain the bulk of variations in inequality across countries or over time.
Frequently Asked Questions (5)
Q1. What are the contributions mentioned in the paper "The construction and interpretation of combined cross-section and time-series inequality datasets" ?

Finally, the authors have used their broadly comparable dataset to examine international patterns of inequality and poverty. 

Since there are multiple combinations of concepts and reference units, and usually more than one source per country, the authors have what AB refer to as a "bewildering variety of estimates". 

In developing countries expenditure surveys are prevalent since many households do not know their actual income or their knowledge is incomplete. 

Their full dataset, which includes countries with only one or two observations, can be used to conduct cross-country analysis for specific years. 

The second group includes those issues that are not yet convincingly tackled: the inter-temporal and international comparability and consistency of inequality data.