scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Notes on CEPII’s Distances Measures: The GeoDist Database

TL;DR: GeoDist as mentioned in this paper provides several geographical variables, in particular bilateral distances measured using city-level data to assess the geographic distribution of population inside each nation, and calculates different measures of bilateral distances available for most countries across the world (225 countries in the current version of the database).
Abstract: GeoDist makes available the exhaustive set of gravity variables used in Mayer and Zignago (2005). GeoDist provides several geographical variables, in particular bilateral distances measured using citylevel data to assess the geographic distribution of population inside each nation. We have calculated different measures of bilateral distances available for most countries across the world (225 countries in the current version of the database). For most of them, different calculations of “intra-national distances” are also available. The GeoDist webpage provides two distinct files: a country-specific one (geo_cepii) and a dyadic one (dist_cepii) including a set of different distance and common dummy variables used in gravity equations to identify particular links between countries such as colonial past, common languages, contiguity. We try to improve upon the existing similar datasets in terms of geographical coverage, quality of measurement and number of variables provided.

Summary (2 min read)

1. INTRODUCTION

  • For each country, the authors report the official languages (up to three), as well as the languages spoken by at least 20% of the population and the languages spoken by between 9 and 20% of the population (up to four languages in each of those cases).
  • For these countries, the authors propose the distances data calculated for both the capital city and the economic center.

2.1. Country-level variables

  • ISO codes in two and three characters, and in three numbers respectively, also known as iso2, iso3, cnum.
  • Name of country in English and French respectively, also known as 3 country, pays.
  • Dummy variable set equal to 1 for landlocked countries, also known as landlocked.
  • Languages (mother tongue, lingua francas or second languages) spoken by at least 20% of the population of the country, also known as lang20_i.
  • Colonizers of the country for a relatively long period of time and with a substantial participation in the governance of the colonized country, also known as 6. colonizeri.

2.2. Cities variables used in the computation of distances

  • The following (country-specific also) variables describe the city used to calculate simple distances, i.e. the ones where only one city by country is considered (city or “agglomeration”, which usually corresponds to an enlarged definition of the city: “Essen” is for instance the biggest agglomeration of Germany in their sample).
  • These two variables incorporate internal distances based on areas and also provided in the geo_cepii.xls file (see description above).
  • Take the example of trade between the United Kingdom and Italy.
  • The basic idea, inspired by Head and Mayer (2002), is to calculate distance between two countries based on bilateral distances between the biggest cities of those two countries, those inter-city distances being weighted by the share of the city in the overall country’s population.
  • 13 12More precisely, the authors use the popdata.zip file available at http://www.world-gazetteer.com and take the 25 more populated cities by country.

3.3. Other gravity variables

  • Finally the dist_cepii.xls file provides also dummy variables indicating whether the two countries are uous , share a common language, have had a common colonizer after 1945 , have ever had a colonial link , have had a colonial relationship after 1945 (col45), are currently in a colonial relationship 14 or were/are the same country 15.
  • There are two common languages dummies, the first one based on the fact that two countries share a common official language, and the other one set to one if a language is spoken by at least 9% of the population in both countries.
  • Colonization is here a fairly general term that the authors use to describe a relationship between two countries, independently of their level of development, in which one has governed the other over a long period of time and contributed to the current state of its institutions.

4. REFERENCES

  • K. HEAD AND T. MAYER (2002), “Illusory Border Effects: Distance Mismeasurement Inflates Estimates of Home Bias in Trade”, CEPII Working Paper 2002-01.
  • 15This variable complements the comcol variable setting to one if countries were or are the same state or the same administrative entity for a long period (25-50 years in the twentieth century, 75 year in the ninetieth and 100 years before).
  • Spanish colonies are distinguished following their administrative divisions in the colonial period .

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

No 2011 – 25
December
DOCUMENT DE TRAVAIL
Notes on CEPII’s distances measures:
The
GeoDist
database
_____________
Thierry Mayer
Soledad Zignago

CEPII, WP No 2011 25 Notes on CEPII’s distances measures
TABLE OF CONTENTS
Non-technical summary. . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Résumé non technique . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Résumé court . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2. The country-specific files: geo_cepii.xls and geo_cepii.dta . . . 8
2.1. Country-level variables . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2. Cities variables used in the computation of distances . . . . . . . . . . . . 9
3. The bilateral files: dist_cepii.xls and dist_cepii.dta . . . . . 10
3.1. Simple distances: dist and distcap . . . . . . . . . . . . . . . . . 10
3.2. Weighted distances: distw and distwces . . . . . . . . . . . . . . . 11
3.3. Other gravity variables . . . . . . . . . . . . . . . . . . . . . . . . 12
4. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2

CEPII, WP No 2011 25 Notes on CEPII’s distances measures
NOTES ON CEPIIS DISTANCES MEASURES:
THE GeoDist DATABASE
NON-TECHNICAL SUMMARY
GeoDist makes available the exhaustive set of gravity variables developed in Mayer and Zignago (2005)
to analyze market access difficulties in global and regional trade flows. GeoDist provides useful data
online (http://www.cepii.fr/anglaisgraph/bdd/distances.htm) for empirical eco-
nomic research including geographical elements and variables. A common use of these files is the
estimation by trade economists of gravity equations describing bilateral patterns of trade flows. Co-
variates such as bilateral distance, contiguity, or colonial historical links have also been used in other
fields than international trade: for the study of bilateral flows of foreign direct investment for instance,
but also by researchers interested in explaining migration patterns, international flows of tourists, of tele-
phone traffic, etc. Even outside economics, several researchers in different social sciences use these types
of variables. Political scientists, for instance, use distance and contiguity (among other determinants) to
explain why some pairs of countries have a higher probability than others of going to war. Other datasets
have been proposed in the literature and provide geographical and distance data, notably those developed
by Jon Haveman, Vernon Henderson and Andrew Rose. We try to improve upon the existing sets of
variables in terms of geographical coverage, measurement and the number of variables provided.
Our first dataset (geo_cepii), incorporates country-specific geographical variables for 225 countries in the
world, including the geographical coordinates of their capital cities, the languages spoken in the country
under different definitions, a variable indicating whether the country is landlocked, and their colonial
links. The second dataset (dist_cepii) is dyadic, in the sense that it includes variables valid for pairs
of countries. Distance is the most common example of such a variable, and the file includes different
measures of bilateral distances (in kilometers) available for most countries across the world.
The main contribution of GeoDist is to compute internal (or intra-national) and international bilateral
distances in a totally consistent way. How define internal distances of countries? How make those
constructed internal distances consistent with ‘traditional’ international distances calculations? The latter
question is in fact crucial for obtaining a correct estimate of trade impediments. Any overestimate of the
internal / external distance ratio will yield to a mechanic upward bias in the border effect estimate. We
have computed these distances using city-level data to assess the geographic distribution of population
(in 2004) inside each nation. The basic idea, inspired by Head and Mayer (2002), is to calculate distance
between two countries based on bilateral distances between the biggest cities of those two countries,
those inter-city distances being weighted by the share of the city in the overall country’s population.
3

CEPII, WP No 2011 25 Notes on CEPII’s distances measures
ABSTRACT
GeoDist makes available the exhaustive set of gravity variables used in Mayer and Zignago (2005).
GeoDist provides several geographical variables, in particular bilateral distances measured using city-
level data to assess the geographic distribution of population inside each nation. We have calculated
different measures of bilateral distances available for most countries across the world (225 countries in
the current version of the database). For most of them, different calculations of intra-national distances”
are also available. The GeoDist webpage provides two distinct files: a country-specific one (geo_cepii)
and a dyadic one (dist_cepii) including a set of different distance and common dummy variables used in
gravity equations to identify particular links between countries such as colonial past, common languages,
contiguity. We try to improve upon the existing similar datasets in terms of geographical coverage,
quality of measurement and number of variables provided.
JEL Classification: F10, F12; F13, F14, F15, C80.
Keywords: Distances, International Trade, Databases, Gravity, Trade Costs, Border Effects.
4

CEPII, WP No 2011 25 Notes on CEPII’s distances measures
NOTES SUR LA BASE DE DONNÉES DE DISTANCES DU CEPII (GeoDist)
RÉSUME NON TECHNIQUE
GeoDist fournit l’ensemble des données développées par Mayer and Zignago (2005) pour mesurer les
difficultés d’accès aux marchés mondiaux. GeoDist, ou base de données de distances du CEPII, propose
en ligne (http://www.cepii.fr/anglaisgraph/bdd/distances.htm) des données géo-
graphiques utiles à la recherche empirique, en particulier pour l’estimation des équations de gravité dans
le domaine du commerce international. Par rapport aux séries élaborées par Jon Haveman, Vernon Hen-
derson et Andrew Rose, nous avons étendu la couverture géographique, affiné les mesures et développé le
nombre des variables. Au-delà de l’analyse du commerce, la distance entre deux pays, leur contigüité, les
liens historiques sont autant de variables utilisées dans d’autres champs de recherche, comme ceux des
investissements directs, des flux migratoires ou touristiques, du trafic téléphonique, etc. Les chercheurs
en sciences sociales recourent également à des variables ; en sciences politiques par exemple, distance et
contigüité sont prises en compte dans le calcul des probabilités de conflit.
Une première série de données rassemble les variables caractérisant chacun des 225 pays. Le fichier
geo_cepii (geo_cepii.xls ou geo_cepii.dta) contient les variables géographiques des pays et de leur prin-
cipale ville ou agglomération : l’identification du pays (codes ISO) ; la superficie (en km2), utilisée en
particulier pour le calcul des distances internes, les coordonnées géographiques de la (ou des) capitale(s),
l’éventuel enclavement, le continent, etc. Cette série de données comporte aussi plusieurs variables de
langue permettant de déterminer les proximités linguistiques. Pour chaque pays, on peut avoir jusqu’à
trois langues officielles ; la base distingue les langues parlées par plus de 20 % de la population et celles
parlées par un tranche de 9 à 20 % de la population. Les relations coloniales passées constituent une autre
information souvent utilisée par les économistes pour approximer les similitudes culturelles politiques
ou institutionnelles.
Une seconde série de données est dyadique, au sens ou les variables sont calculées par couple de pays : la
distance (km) entre deux pays est l’exemple type de ce genre de variables bilatérales. Le fichier dist_cepii
(dist_cepii.xls ou dist_cepii.dta) contient les variables bilatérales : les différentes mesures de distances et
les variables muettes indiquant la contigüité, la communauté de langue, ou de liens coloniaux. On mesure
deux types de distances : simple, pour laquelle on recourt à une seule ville ; pondérée, qui considère
plusieurs villes par pays afin de prendre en compte la répartition géographique de l’activité économique.
Ces distances pondérées sont la principale contribution de GeoDist. Pour pouvoir comparer les flux
internationaux aux flux de commerce “intra-nationaux”, ce que nous faisions dans Mayer et Zignago
(2005) en estimant des effets frontière sur l’ensemble des pays du monde, il fallait construire une bonne
approximation des distances moyennes parcourues par les biens à l’intérieur de chaque pays. En effet, une
sous-estimation des distances relatives biaise mécaniquement à la hausse l’effet frontière estimé. Pour
éviter cela, nous tenons compte de la répartition géographique de l’activité économique à l’intérieur des
nations en utilisant les populations et coordonnées des principales villes de chaque pays dans le calcul
de la matrice des distances. L’idée, inspirée de Head and Mayer (2002) est de calculer les distances entre
5

Citations
More filters
01 May 2009
TL;DR: The BACI international trade database covering more than 200 countries and 5,000 products was used to reconcile data reported by almost 150 countries to the United Nations Statistics Division, which disseminate them via COMTRADE as mentioned in this paper.
Abstract: We document BACI, our international trade database covering more than 200 countries and 5,000 products, between 1994 and 2006. Original procedures have been developed to reconcile data reported by almost 150 countries to the United Nations Statistics Division, which disseminate them via COMTRADE. When both exporting and importing countries do report, we have two different figures for the same flow, which is useful to reconcile in a single figure. Firstly, as import values are reported CIF (cost, insurance and freight) and the exports are reported FOB (free on board), CIF costs have to be estimated and removed from imports values to compute FOB import values. We regress the unit-values ratios reported for a given elementary flow by gravity variables and for years, and world median unit-value for each product category . The second step is an evaluation of the reliability of country reporting, based on the reporting distances among partners. We decompose the absolute value of the ratios of mirror flows using a (weighted) variance analysis, and an index is build for each country. These reporting qualities are used as weights in the reconciliation of each bilateral trade flow twice reported. Taking advantage of this double information on each flow, we end up with a large coverage of countries not reporting at a given level of the product classification with a special care in the treatment of unit-values. BACI is freely available to users of COMTRADE database in our webpage: http://www.cepii.fr/anglaisgraph/bdd/baci.htm

675 citations

Journal ArticleDOI
TL;DR: This paper developed a general equilibrium model of international trade that features selection across multiple products, products and countries, where firms' export decisions depend on a combination of product productivity and country consumer tastes.
Abstract: This paper develops a general equilibrium model of international trade that features selection across …rms, products and countries. Firms’export decisions depend on a combination of …rm “productivity”and …rm-product-country “consumer tastes”, both of which are stochastic and unknown prior to the payment of a sunk cost of entry. Higher-productivity …rms export a wider range of products to a larger set of countries than lower-productivity …rms. Trade liberalization induces endogenous reallocations of resources that foster productivity growth both within and across …rms. Empirically, we …nd key implications of the model to be consistent with U.S. trade

612 citations

Journal ArticleDOI
TL;DR: In this article, the authors investigated the links between price returns for 25 commodities and stocks over the period from January 2001 to November 2011, by paying a particular attention to energy raw materials.

460 citations

Journal ArticleDOI
TL;DR: In this article, the export competitiveness of the European Union has been affected by environmental regulation and innovation, and a theoretically based gravity model applied to the export dynamics of five aggregated manufacturing sectors classified by their technological or environmental content.

387 citations

BookDOI
15 Nov 2016
TL;DR: The Advanced Guide to Trade Policy Analysis as mentioned in this paper is a complementary follow-up to the original Practical Guide to trade policy analysis and provides the most recent tools for analysis of trade policy using structural gravity models.
Abstract: This Advanced Guide to Trade Policy Analysis is a complementary follow-up to the original Practical Guide to Trade Policy Analysis. It provides the most recent tools for analysis of trade policy using structural gravity models. Written by experts who have contributed to the development of theoretical and empirical methods in the academic gravity literature and who have rich practical experience in the field, this publication explains how to conduct partial equilibrium estimations as well as general equilibrium analysis with structural gravity models and contains practical guidance on how to apply these tools to concrete policy questions. This Advanced Guide has been developed to contribute to the enhancement of developing countries’ capacity to analyse and implement trade policy. It is aimed at government experts engaged in trade negotiations, as well as graduate students and researchers involved in trade-related study or research.

380 citations


Cites background or result from "Notes on CEPII’s Distances Measures..."

  • ...This estimate is significantly smaller compared to the famous border estimate of 22 for inter-provincial trade within Canada relative to international trade between Canadian provinces and US states reported in McCallum (1995). The proper econometric specification of the structural gravity model (i....

    [...]

  • ...The CEPII’s GeoDist database reports data on time-invariant gravity variables for 225 countries (Mayer and Zignago, 2011)....

    [...]

References
More filters
Posted Content
TL;DR: In this article, the authors build a theoretical model of multi-product firms that highlights how competition across market destinations affects both a firm's exported product range and product mix and show how tougher competition in an export market induces a firm to skew its export sales towards its best performing products.
Abstract: We build a theoretical model of multi-product firms that highlights how competition across market destinations affects both a firm's exported product range and product mix. We show how tougher competition in an export market induces a firm to skew its export sales towards its best performing products. We find very strong confirmation of this competitive effect for French exporters across export market destinations. Theoretically, this within firm change in product mix driven by the trading environment has important repercussions on firm productivity. A calibrated fit to our theoretical model reveals that these productivity effects are potentially quite large.

612 citations

Posted Content
03 May 2006
TL;DR: In this paper, the authors calculated and made available different measures of bilateral distances (in kms) available for most countries across the world (225 countries in the current version of the database).
Abstract: We have calculated and made available different measures of bilateral distances (in kms) available for most countries across the world (225 countries in the current version of the database). For most of them, different calculations of “intra-national distances” are also available. There are two distinct files: a country-specific one (geo_cepii.xls or geo_cepii.dta) and a bilateral one (dist_cepii.xls or dist_cepii.dta), including the set of different distance variables and common dummy variables used in gravity equations to identify particular links between countries such as colonial past, common languages, contiguity. A common use of those files is the estimation of gravity equations describing bilateral trade flows. We try to improve upon the existing similar datasets in terms of geographical coverage, measurement and number of variables provided.

447 citations

Journal ArticleDOI
TL;DR: The authors proposed separate series for a common language depending upon whether ease of communication facilitates trade through translation or the ability to communicate directly, and examined the effect of two country-specific linguistic influences on trade: literacy and linguistic diversity at home.

442 citations


"Notes on CEPII’s Distances Measures..." refers background in this paper

  • ...• continent: Continent to which the country is belonging • langoff_i: Official or national languages and languages spoken by at least 20% of the population of the country (and spoken in another country of the world5) following the same logic than the “open-circuit languages” in Mélitz (2002)....

    [...]

Posted Content
TL;DR: In this article, the authors argue that border effects may have been mismeasured in a way that leads to a systematic overstatement, and they propose a correct measure of distance that would be consistent for international as well as intra-national trade flows.
Abstract: The measured effect of national borders on trade seems too large to be explained by the apparently small border-related trade barriers. This puzzle was first presented by McCallum (1995) and has gone on to spawn a large and growing literature on so-called border effects. We argue in this paper that, because distances are always mismeasured in the existing literature, the border effects may have been mismeasured in a way that leads to a systematic overstatement. Our goal here is to develop a correct measure of distance that would be consistent for international as well as intra-national trade flows. We show how use of the existing methods for calculating distance leads to “illusory” border and adjacency effects. We then apply our methods to data on interstate trade in the United States and inter-member trade in the European Union. We find that our new distance measure reduces the estimated border and adjacency effects but does not eliminate them. Thus, while we do not solve the border effect puzzle, we do show a way to shrink it.

441 citations


"Notes on CEPII’s Distances Measures..." refers background or methods in this paper

  • ...The general formula developed by Head and Mayer (2002) and used for calculating distances between country i and j is dij = (∑ k∈i (popk/popi) ∑ `∈j (pop`/popj)d θ k` )1/θ , (1) where popk designates the population of agglomeration k belonging to country i....

    [...]

  • ...The basic idea, inspired by Head and Mayer (2002), is to calculate distance between two countries based on bilateral distances between the biggest cities of those two countries, those inter-city distances being weighted by the share of the city in the overall country’s population....

    [...]

  • ...L’idée de Head and Mayer (2002) reprise ici est de calculer la distance entre deux pays comme une moyenne des distances entre leurs principales villes pondérée par le poids des villes dans la population des pays....

    [...]

  • ...L’idée, inspirée de Head and Mayer (2002) est de calculer les distances entre 5 CEPII, WP No 2011 – 25 Notes on CEPII’s distances measures deux pays comme une moyenne des distances entre leurs principales villes pondérée par leur population....

    [...]

  • ...The distance formula used is a generalized mean of city-to-city bilateral distances developed by Head and Mayer (2002), which takes the arithmetic mean and the harmonic means as special cases....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors investigate the determinants of the carbon price during the two phases of the European Union Emission Trading Scheme (EU ETS), relying on daily EU allowance futures contracts.

243 citations

Frequently Asked Questions (7)
Q1. What are the contributions mentioned in the paper "Microsoft word - document3" ?

The GeoDist webpage provides two distinct files: a country-specific one ( geo_cepii ) and a dyadic one ( dist_cepii ) including a set of different distance and common dummy variables used in gravity equations to identify particular links between countries such as colonial past, common languages, contiguity. The authors try to improve upon the existing similar datasets in terms of geographical coverage, quality of measurement and number of variables provided. 

Covariates such as bilateral distance, contiguity, or colonial historical links have also been used in other fields than international trade: for the study of bilateral flows of foreign direct investment for instance, but also by researchers interested in explaining migration patterns, international flows of tourists, of telephone traffic, etc. 

A common use of these files is the estimation by trade economists of gravity equations describing bilateral patterns of trade flows. 

The main contribution of GeoDist is to compute internal (or intra-national) and international bilateral distances in a totally consistent way. 

Their first dataset (geo_cepii), incorporates country-specific geographical variables for 225 countries in the world, including the geographical coordinates of their capital cities, the languages spoken in the country under different definitions, a variable indicating whether the country is landlocked, and their colonial links. 

Political scientists, for instance, use distance and contiguity (among other determinants) to explain why some pairs of countries have a higher probability than others of going to war. 

The basic idea, inspired by Head and Mayer (2002), is to calculate distance between two countries based on bilateral distances between the biggest cities of those two countries, those inter-city distances being weighted by the share of the city in the overall country’s population.