Notes on CEPII’s Distances Measures: The GeoDist Database
Summary (2 min read)
1. INTRODUCTION
- For each country, the authors report the official languages (up to three), as well as the languages spoken by at least 20% of the population and the languages spoken by between 9 and 20% of the population (up to four languages in each of those cases).
- For these countries, the authors propose the distances data calculated for both the capital city and the economic center.
2.1. Country-level variables
- ISO codes in two and three characters, and in three numbers respectively, also known as iso2, iso3, cnum.
- Name of country in English and French respectively, also known as 3 country, pays.
- Dummy variable set equal to 1 for landlocked countries, also known as landlocked.
- Languages (mother tongue, lingua francas or second languages) spoken by at least 20% of the population of the country, also known as lang20_i.
- Colonizers of the country for a relatively long period of time and with a substantial participation in the governance of the colonized country, also known as 6. colonizeri.
2.2. Cities variables used in the computation of distances
- The following (country-specific also) variables describe the city used to calculate simple distances, i.e. the ones where only one city by country is considered (city or “agglomeration”, which usually corresponds to an enlarged definition of the city: “Essen” is for instance the biggest agglomeration of Germany in their sample).
- These two variables incorporate internal distances based on areas and also provided in the geo_cepii.xls file (see description above).
- Take the example of trade between the United Kingdom and Italy.
- The basic idea, inspired by Head and Mayer (2002), is to calculate distance between two countries based on bilateral distances between the biggest cities of those two countries, those inter-city distances being weighted by the share of the city in the overall country’s population.
- 13 12More precisely, the authors use the popdata.zip file available at http://www.world-gazetteer.com and take the 25 more populated cities by country.
3.3. Other gravity variables
- Finally the dist_cepii.xls file provides also dummy variables indicating whether the two countries are uous , share a common language, have had a common colonizer after 1945 , have ever had a colonial link , have had a colonial relationship after 1945 (col45), are currently in a colonial relationship 14 or were/are the same country 15.
- There are two common languages dummies, the first one based on the fact that two countries share a common official language, and the other one set to one if a language is spoken by at least 9% of the population in both countries.
- Colonization is here a fairly general term that the authors use to describe a relationship between two countries, independently of their level of development, in which one has governed the other over a long period of time and contributed to the current state of its institutions.
4. REFERENCES
- K. HEAD AND T. MAYER (2002), “Illusory Border Effects: Distance Mismeasurement Inflates Estimates of Home Bias in Trade”, CEPII Working Paper 2002-01.
- 15This variable complements the comcol variable setting to one if countries were or are the same state or the same administrative entity for a long period (25-50 years in the twentieth century, 75 year in the ninetieth and 100 years before).
- Spanish colonies are distinguished following their administrative divisions in the colonial period .
Did you find this useful? Give us your feedback
Citations
675 citations
612 citations
460 citations
387 citations
380 citations
Cites background or result from "Notes on CEPII’s Distances Measures..."
...This estimate is significantly smaller compared to the famous border estimate of 22 for inter-provincial trade within Canada relative to international trade between Canadian provinces and US states reported in McCallum (1995). The proper econometric specification of the structural gravity model (i....
[...]
...The CEPII’s GeoDist database reports data on time-invariant gravity variables for 225 countries (Mayer and Zignago, 2011)....
[...]
References
612 citations
447 citations
442 citations
"Notes on CEPII’s Distances Measures..." refers background in this paper
...• continent: Continent to which the country is belonging • langoff_i: Official or national languages and languages spoken by at least 20% of the population of the country (and spoken in another country of the world5) following the same logic than the “open-circuit languages” in Mélitz (2002)....
[...]
441 citations
"Notes on CEPII’s Distances Measures..." refers background or methods in this paper
...The general formula developed by Head and Mayer (2002) and used for calculating distances between country i and j is dij = (∑ k∈i (popk/popi) ∑ `∈j (pop`/popj)d θ k` )1/θ , (1) where popk designates the population of agglomeration k belonging to country i....
[...]
...The basic idea, inspired by Head and Mayer (2002), is to calculate distance between two countries based on bilateral distances between the biggest cities of those two countries, those inter-city distances being weighted by the share of the city in the overall country’s population....
[...]
...L’idée de Head and Mayer (2002) reprise ici est de calculer la distance entre deux pays comme une moyenne des distances entre leurs principales villes pondérée par le poids des villes dans la population des pays....
[...]
...L’idée, inspirée de Head and Mayer (2002) est de calculer les distances entre 5 CEPII, WP No 2011 – 25 Notes on CEPII’s distances measures deux pays comme une moyenne des distances entre leurs principales villes pondérée par leur population....
[...]
...The distance formula used is a generalized mean of city-to-city bilateral distances developed by Head and Mayer (2002), which takes the arithmetic mean and the harmonic means as special cases....
[...]
243 citations
Related Papers (5)
Frequently Asked Questions (7)
Q2. What other fields have been used to study trade flows?
Covariates such as bilateral distance, contiguity, or colonial historical links have also been used in other fields than international trade: for the study of bilateral flows of foreign direct investment for instance, but also by researchers interested in explaining migration patterns, international flows of tourists, of telephone traffic, etc.
Q3. What is the common use of these files?
A common use of these files is the estimation by trade economists of gravity equations describing bilateral patterns of trade flows.
Q4. What is the main contribution of GeoDist?
The main contribution of GeoDist is to compute internal (or intra-national) and international bilateral distances in a totally consistent way.
Q5. What is the first dataset for geodist?
Their first dataset (geo_cepii), incorporates country-specific geographical variables for 225 countries in the world, including the geographical coordinates of their capital cities, the languages spoken in the country under different definitions, a variable indicating whether the country is landlocked, and their colonial links.
Q6. What are the main contributions of GeoDist?
Political scientists, for instance, use distance and contiguity (among other determinants) to explain why some pairs of countries have a higher probability than others of going to war.
Q7. what is the idea of distances between two countries?
The basic idea, inspired by Head and Mayer (2002), is to calculate distance between two countries based on bilateral distances between the biggest cities of those two countries, those inter-city distances being weighted by the share of the city in the overall country’s population.