Journal ArticleDOI

# A Perimeter-Based Clustering Index for Measuring Spatial Segregation: A Cognitive GIS Approach:

01 Jun 1998-Environment and Planning B-planning & Design (SAGE PublicationsSage UK: London, England)-Vol. 25, Iss: 3, pp 327-343
TL;DR: Many efforts have been made to develop segregation indices that incorporate spatial interaction based on the contiguity concept as mentioned in this paper, which refers to how similar the concentration of the subject of a text is to that of the entire text.
Abstract: Many efforts have been made to develop segregation indices that incorporate spatial interaction based on the contiguity concept. Contiguity refers to how similar the concentration of the subject of...

### 1 Introduction

• A new index to measure the degree of clustering is developed and then compared with the existing indices of segregation.
• In section 5, four existing indices to be compared with the new clustering index are discussed briefly, and the clustering index and the four other indices are compared in two hypothetical settings including binary distribution in a regular lattice, and semicontiguous distribution in a regular lattice.
• In section 6, the five indices are compared in a real-world application, the five boroughs of New York City.

### 2 Operational definition of clustering

• Here, adjacent areal units showing a high concentration of the subject form a few clusters on the map.
• Once clusters are obtained, one needs to quantify the size, shape, and closeness of the clusters.
• A measure that combines these three factors is the total perimeter of the clusters.
• When shape and adjacency of the clusters are the same, the total perimeter (P) of the clusters is a proper measure of the total size [see figures 1(a) and 1(b)].
• When the size and adjacency of the clusters are constant, circular shapes have the minimum possible values [see figures 1(c) and 1(d)].

### EEIWA

• Only the boundary between a pair of adjacent tracts which have different / values (high concentration versus low concentration) remains.
• One advantage of this measure for an irregular polygon layout is that the degree of proximity between polygons is automatically taken into account during the merging process.
• The authors apply the Monte Carlo method to establish the distribution of the index based on the assumption of a stochastic process, as the distribution is not obtainable analytically.
• Based on the assumption that each member of an object group can be located freely, the probability that a member can be placed in a certain areal unit is proportional to the ratio of the total number of the object group in the tract to the citywide total.
• In the following section, some special examples of the segregated distributions are chosen to compare the clustering index with other existing segregation indices.

### 5 Comparisons in hypothetical space

• In the first setting, only binary distribution is allowed on the regular lattice.
• In the second setting, contiguous distribution is allowed, while the total number in an object group remains constant.
• Therefore each tract can contain any number of people (not exceeding 10) in the object group.
• The total number of people in the object group in the city should remain constant.
• These two settings are chosen to analyze the effects of the marginal change of spatial setting.

### 5.1 Binary distribution in a regular lattice

• The total number in an object group varies in each distribution.
• It may be misleading, because an index obtained in a city having, for example, a 20% black population may not be quite comparable with that in another city with a 40% black population.
• As an example, one can assume that two cities have identical geographical settings; however, one has 9 minority people, whereas the other has 4 minority people.
• If they can choose their locations freely, the likelihood of all minority people choosing to live in a single tract is lower in the city populated with 9 people than in the city populated with 4 people.

### 5.2 Semicontinuous distribution in a regular lattice

• Overall, the distribution of the black population generates a fairly consistent ranking of each borough, and the distribution of the origins generates the most mixed rankings over the indices.
• 7 sp and I M produce similar rankings to each other and underestimate the highly concentrated enclaves such as the origins in Staten Island.
• I c produces distinctive rankings and is very sensitive to the separation of clusters such as the three clusters of origins in Manhattan, which produce the lowest ranking for P among the boroughs.

### 6.2 Intersubject comparison

• As shown in this intersubject comparison, each index shows quite a different degree of segregation for each subject.
• This result implies that the choice of an index is a critical issue when the degrees of segregation of different subjects of interest are compared.
• As an example, a policymaker may need to choose between a populationbased program (targeting people under the poverty level) and a neighborhood-based program (targeting the neighborhoods which generate more homeless people) for the prevention of homelessness.
• Based on the clustering index, origins of the homeless form tighter enclaves than poverty in New York City, in contrast to other indices.

### 7 Conclusion

• The proposed clustering index tends to give more weight to enclaveness than contiguity alone.
• This may be a good property for those cases in which the primary concern of an investigator is the formation of enclaves of a socioeconomic subject, including minority populations, poverty, crime, epidemics, and mortgage red-lining.
• Additionally, its property of robustness to the city wide rate allows us to perform properly an intercity comparison of a given subject by index score, even when the citywide rate varies significantly, unlike in the case of the other measures.
• Further research is therefore required for detailed refinements of the clustering index to better capture enclaveness of a subject of interest.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

From the SelectedWorks of Dennis P. Culhane
1998
A Perimeter-based Clustering Index for Measuring
Spatial Segregation: A Cognitive GIS Approach
Dennis P Culhane, University of Pennsylvania
Chang Moo Lee, University of Pennsylvania
Available at: h+ps://works.bepress.com/dennis_culhane/41/

Environment and Planning B: Planning and Design 1998, volume 25, pages
327
- 343
A perimeter-based clustering index for measuring spatial
segregation: a cognitive GIS approach
C-M Lee
Wharton Real Estate Center, University of Pennsylvania, 3600 Market Street, Philadelphia,
PA 19104-2648, USA; e-mail: leecm@wharton.upenn.edu
D P Culhane
School of Social Work, University of Pennsylvania, PA 19104-2648, USA;
e-mail: dennis@cmhpsr.upenn.edu
Received 25 November 1996; in revised form 2 May 1997
Abstract. Many efforts have been made to develop segregation indices that incorporate spatial
interaction based on the contiguity concept. Contiguity refers to how similar the concentration of
the subject of interest in one areal unit is to that in adjacent areal units. However, highly segregated
situations are typically considered to be isolated sections or enclaves rather than smoothly formed
peaks of concentration in space. Therefore, highly segregated enclaves may not exhibit contiguity. In
this paper, a new index to measure the degree of clustering is developed and it is compared with the
existing indices of concentration or segregation. The proposed clustering index (7
C
) tends to give more
weight to 'enclaveness' rather than contiguity alone. This may be a good property for those cases in
which the primary concern of an investigator is the formation of enclaves of a socioeconomic subject,
including minority populations, poverty, crime, epidemics, and mortgage red-lining. Additionally, its
property of robustness to the citywide rate allows us to perform properly an intercity comparison of a
given subject by index score even when the citywide rate varies significantly, unlike the other measures.
1 Introduction
Numerous efforts have been made to develop a proper index to measure the spatial
segregation of a population group.
(1)
Though each index characterizes somewhat different
aspects of a spatial distribution, one can distinguish two types of indices: measures
ignoring spatial interaction between areal units; and measures incorporating spatial
interaction.
The problem with measures of segregation that lack spatial interaction components,
including the dissimilarity index, the Gini coefficient, and the entropy index (Theil,
1972),
is well illustrated in the case of the 'checkerboard problem', described by White
(1983).
Several efforts have been made to develop segregation indices that incorporate
spatial interaction, including the index of spatial proximity (White, 1986) and the
distance-based index of dissimilarity (Morgan, 1982). In general, these measures include
spatial interaction by distance or binary adjacency between two areal units. Recently
Wong (1993) formulated a new segregation index, which uses the length of the common
boundary of two areas as an indicator of the degree of social interaction between the
residents of the two areas.
Spatial-interaction measures in geography, and segregation measures incorporating
spatial interaction in sociology are similar in concept. However, spatial-interaction
measures in geography are based only on distribution in physical space, whereas the
segregation measures take account in population distribution overlaid on physical
space along with the distribution of physical space
itself.
The spatial-interaction segrega-
tion indices in sociology are derived from Dacey (1968) and Geary (1954), where they
have been labeled 'contiguity' measures.
See Massey and Denton (1988) for the existing measures.

328
C-M Lee, D P Culhane
Contiguity refers to how similar the concentration of the subject of interest in one
areal unit is to that in adjacent areal units. If the figures for adjoining areal units are
generally closer than those for the areal units not adjoining, this condition yields a con-
tiguous distribution of the subject of interest (Dacey, 1968; Geary, 1954). This contiguity
aspect of spatial distribution has been well developed into a field of spatial statistics
known as spatial autocorrelation. In the last twenty years, a number of instruments for
testing for and measuring spatial autocorrelation have appeared (Anselin, 1988). To
geographers, the best-known statistics are Moran's /, and, to a lesser extent, Geary's c
(Cliff and Ord, 1973).
In some cases (Massey and Denton, 1988), the contiguity measures in geography
are interpreted as clustering indices, with some modifications. However, a high degree
of clustering does not always represent a high degree of contiguity. For example, one can
imagine a spatial distribution pattern in which one subject of interest forms isolated
enclaves which have visible boundaries. That distribution is not supposed to yield a
high degree of contiguity, as the difference at the boundaries of the enclaves reduces
the overall degree of contiguity. In the real world, one is generally concerned about
isolated enclaves of a population group which are recognized by both high concentra-
tion and separateness, rather than the spatial contiguity of their distribution alone. For
point data in the natural space, there are some measures which use nearest-neighbor
methods to describe the degree of clustering (Ripley, 1981). However, for areal data in
urban space, overlaid with population, one needs to have a different measure of
clustering rather than the existing contiguity measure of segregation.
In this paper, a new index to measure the degree of clustering is developed and then
compared with the existing indices of segregation. In section 2, clustering is defined in
an operational way, and in sections 3 and 4 a method for calculating the new clustering
index and its properties are discussed. In section 5, four existing indices to be compared
with the new clustering index are discussed briefly, and the clustering index and the four
other indices are compared in two hypothetical settings including binary distribution in
a regular lattice, and semicontiguous distribution in a regular lattice. In section 6, the
five indices are compared in a real-world application, the five boroughs of New York City.
2 Operational definition of clustering
When the spatial distribution of a subject of interest on a map is examined, viewers
tend to draw arbitrary boundaries of clusters and define a set of clusters cognitively,
whether it is a point distribution or an areal data distribution. This cognition could be
said to have three attributes: the total size of the clusters, their shape, and the closeness
between them. Here, a clustering index is derived based on these three attributes.
In order to draw the boundaries of clusters, an objective way to define clusters is
needed. In an urban setting, the probability of occurrence of a subject in an areal unit
depends on population rather than the size of the areal unit. For example, all other
factors being equal, the expected number of the poor in a census tract depends on the
number of people residing in the tract rather than the physical size of the tract. Once
the rate of an object group to population in each tract is determined, the next issue is
how to define concentration of the object group. One popular way of defining concen-
tration is the location quotient (Q
l
).
The location quotient is a device frequently used to identify specialization, concen-
tration, or the potential of an area for selected employment, industry, or output
indicators (Bendavid-Val, 1983; Chen, 1994). It refers to a ratio of the fractional share
of the subject of interest at the local level to the ratio at the regional
level.
When a local
Q
l
in a region is greater than 1, the locality has a higher concentration of the subject of
interest relative to the other localities of the region combined. For example, a census

A clustering index for measuring spatial segregation
329
(a) (b)
(d) (e)
^H More clustered
(f)
Less clustered
Figure 1. Size, shape, and adjacency of
subclusters:
(a) small
size,
p = 8; (b) large size, p = 16;
(c) regular shape, p = 10; (d) irregular shape, p = 14; (e) adjacent, p = 14; (f) separated,
p = 16.
tract or a block group may be equivalent to a locality, and a city to a region. Thus, Q
1
may be used to identify census tracts that contain a higher percentage share of a
subject of interest than a city as a whole, and which have a Q
x
value greater than 1.
Here, adjacent areal units showing a high concentration of the subject form a few
clusters on the map.
Once clusters are obtained, one needs to quantify the size, shape, and closeness of
the clusters. A measure that combines these three factors is the total perimeter of the
clusters. When shape and adjacency of the clusters are the same, the total perimeter (P)
of the clusters is a proper measure of the total size [see figures 1(a) and
1(b)].
When the
size and adjacency of the clusters are constant, circular shapes have the minimum
possible values [see figures 1(c) and
1(d)].
When the size and shape of the clusters are
the same, two adjoining clusters have a smaller total perimeter than two separated
clusters [see figures 1(e) and
1(f)].
Therefore, one can measure the degree of clustering
by assessing how small the total perimeter of the clusters (the concentrated areas of a
subject of interest) is, where the concentrated areas are selected by Q
1
.
3 Calculation of a clustering index
Based on the operational definition of clustering, one can develop a clustering index.
In order to illustrate the process for calculating the clustering index, a hypothetical city
space is assumed as in figure 2(a) (see over), where P is the population of each census
tract, and x
is
the number of an object group in the tract. As a first step, every census tract
is divided into two groups: highly concentrated census tracts, and less concentrated
census tracts based on Q
l
[see figure
2(b)].
When two highly concentrated tracts are
adjacent to each other, the common boundary lines are deleted and the two polygons
of the tracts are merged to form one polygon [see figure
2(c)].
This merging process
continues and finally a few polygons result, which represent highly concentrated areas or
clusters [see figure
2(d)].
The more adjacent the highly concentrated tracts are, the more
common boundaries are erased, and the smaller the ratio of the sum of the perimeters

330
C-M Lee, D P Culhane
! (90
(30,
4J
(50,
3;
6)
(60
f(20
U40
(110
9)
. 1)
6)
13)
(90
^(20
4)
3)
j
:4/3
J3/5
6/9
9/4
PM
2
13/1
3/b
6/4
1
4/9
Total perimeter: 33 (61
28)
(excluding the boundary of the study area)
Total population: 490
Total object group: 49
The numbers in parentheses are
(P,
x)
(a)
(b)
f
(c)
Total perimeter of merged polygons: 23
(excluding the boundary of the study area)
Clustering index =
1
- 23/33 = 0.30
(d)
Figure 2. The merging process used to calculate the clustering index: (a) population and object
group; (b) calculating the location quotient; (c) identifying concentrated tracts; (d) merging con-
centrated tracts.
of the merged polygons to the sum of the perimeters of the original tracts will be. In our
case,
the boundaries of the study area are not included in the calculation.
In this concept, the clustering index can be denoted as follows:
r = 1
EEIWA
where // is a binary value for tract / (1 if Q
l
^ 1; 0 if Q
l
< 1); and b
tj
is the length of
the common boundary between census tracts i and j (0 if tracts i and j are not
connected or / = j). If a pair of adjacent tracts have the same / value (either 1 or 0),
|/
;
- lj\ becomes 0, and their common boundary b
tj
does not count in the numerator in
the equation. Only the boundary between a pair of adjacent tracts which have different
/ values (high concentration versus low concentration) remains.
One advantage of this measure for an irregular polygon layout is that the degree of
proximity between polygons is automatically taken into account during the merging

##### Citations
More filters
Journal ArticleDOI
TL;DR: In this article, the scale and extent of uneven distributions in space for a wide range of census variables are considered and a solution to calculate the index of dissimilarity is presented.
Abstract: The paper considers the scale – the measure, extent, and dimension – of uneven distributions in space for a wide range of census variables. While the traditional ‘index of dissimilarity’ is affected by random as well as social factors, a solution presented here allows the index to be calculated even for very small populations. Small areas across England and Wales tend to be fairly similar demographically but quite diverse on ethnic and socio-economic measures. Differences between areas become more noticeable as we move from districts, to wards, to enumeration districts, but the rate of differentiation depends heavily on the variables considered.

57 citations

Journal ArticleDOI
TL;DR: The successful integration of a comprehensive concept of segregation, high-resolution data and fine-grained spatiotemporal approaches to assessing segregation and environmental exposure would provide more nuanced and robust findings on the associations between segregation and disparities in environmental exposure and their health impacts.
Abstract: Many environmental justice studies have sought to examine the effect of residential segregation on unequal exposure to environmental factors among different social groups, but little is known about how segregation in non-residential contexts affects such disparity. Based on a review of the relevant literature, this paper discusses the limitations of traditional residence-based approaches in examining the association between socioeconomic or racial/ethnic segregation and unequal environmental exposure in environmental justice research. It emphasizes that future research needs to go beyond residential segregation by considering the full spectrum of segregation experienced by people in various geographic and temporal contexts of everyday life. Along with this comprehensive understanding of segregation, the paper also highlights the importance of assessing environmental exposure at a high spatiotemporal resolution in environmental justice research. The successful integration of a comprehensive concept of segregation, high-resolution data and fine-grained spatiotemporal approaches to assessing segregation and environmental exposure would provide more nuanced and robust findings on the associations between segregation and disparities in environmental exposure and their health impacts. Moreover, it would also contribute to significantly expanding the scope of environmental justice research.

48 citations

### Cites background from "A Perimeter-Based Clustering Index ..."

• ...However, some scholars have been skeptical about whether spatial autocorrelation and local spatial statistical approaches can improve the measurement of segregation levels [35,67], arguing that a high degree of positive spatial autocorrelation does not always indicate a high level of segregation....

[...]

• ...However, some scholars have been skeptical about whether spatial autocorrelation and local spatial statistical approaches can improve the measurement of segregation levels [35,67], arguing that a high degree of positive spatial autocorrelation does not always indicate a high level of segregation....

[...]

Book
01 Jan 2001
TL;DR: The time-gis.de website is constantly being updated to provide the most up-to-date information on the latest scientific and technological innovations and developments in the fields of medicine, science and technology.

43 citations

Journal ArticleDOI

TL;DR: The geographical approach to understanding health beliefs and knowledge and how people acquire health information presented here is one that could serve other communities and community health practitioners working to improve chronic disease outcomes in diverse local environments.
Abstract: This article introduces a new theory of geographical analysis, sociospatial knowledge networks, for examining and understanding the spatial aspects of health knowledge (i.e., exactly where health beliefs and knowledge coincide with other support in the community). We present an overview of the theory of sociospatial knowledge networks and an example of how it is being used to guide an ongoing ethnographic study of health beliefs, knowledge, and knowledge networks in a rural community of African Americans, Latinos, and European Americans at high risk for, but not diagnosed with, type 2 diabetes mellitus. We believe that the geographical approach to understanding health beliefs and knowledge and how people acquire health information presented here is one that could serve other communities and community health practitioners working to improve chronic disease outcomes in diverse local environments.

36 citations

21 Feb 2016
TL;DR: In this paper, a GIS-based approach to estimate individual household residential segregation based on three sources of information: a detailed geo-referenced dataset of family characteristics obtained from the 1995 Israeli Census of Population and Housing, subjective data on individuals' estimates of their house's neighborhood, and detailed GIS maps of urban infrastructure.
Abstract: The paper presents a GIS-based approach to estimating individual household residential segregation based on three sources of information: a detailed geo-referenced dataset of family characteristics obtained from the 1995 Israeli Census of Population and Housing, subjective data on individuals' estimates of their house's neighborhood, and detailed GIS maps of urban infrastructure. The potential of the proposed approach is illustrated by studying Jewish-Arab residential segregation in the Yaffo area of Tel Aviv. The combination of detailed objective and subjective geo-referenced data provide the basis for intensive fine-scale urban studies and local planning interventions.

32 citations

### Cites background from "A Perimeter-Based Clustering Index ..."

• ...Introduction Geographic Information Systems (GIS) provide powerful and flexible tools for measuring, analysing and displaying urban residential segregation (Wong, 1997a; Wong and Chong, 1998; Lee and Culhane, 1998)....

[...]

• ...Geographic Information Systems provide powerful and flexible tools for measuring, analyzing and displaying urban residential segregation (Wong, 1997a; Wong and Chong, 1998; Lee and Culhane, 1998)....

[...]

##### References
More filters
Book
31 Aug 1988
TL;DR: In this article, a typology of Spatial Econometric Models is presented, and the maximum likelihood approach to estimate and test Spatial Process Models is proposed, as well as alternative approaches to Inference in Spatial process models.
Abstract: 1: Introduction.- 2: The Scope of Spatial Econometrics.- 3: The Formal Expression of Spatial Effects.- 4: A Typology of Spatial Econometric Models.- 5: Spatial Stochastic Processes: Terminology and General Properties.- 6: The Maximum Likelihood Approach to Spatial Process Models.- 7: Alternative Approaches to Inference in Spatial Process Models.- 8: Spatial Dependence in Regression Error Terms.- 9: Spatial Heterogeneity.- 10: Models in Space and Time.- 11: Problem Areas in Estimation and Testing for Spatial Process Models.- 12: Operational Issues and Empirical Applications.- 13: Model Validation and Specification Tests in Spatial Econometric Models.- 14: Model Selection in Spatial Econometric Models.- 15: Conclusions.- References.

8,282 citations

Book
01 Jan 1981

2,940 citations

Journal ArticleDOI
TL;DR: In this article, residential segregation is viewed as a multidimensional phenomenon varying along five distinct axes of measurement: evenness exposure concentration centralization and clustering, and 20 indices of segregation are surveyed and related conceptually to 1 of the five dimensions.
Abstract: This paper conceives of residential segregation as a multidimensional phenomenon varying along 5 distinct axes of measurement: evenness exposure concentration centralization and clustering. 20 indices of segregation are surveyed and related conceptually to 1 of the 5 dimensions. Using data from a large set of US metropolitan areas the indices are intercorrelated and factor analyzed. Orthogonal and oblique rotations produce pattern matrices consistent with the postulated dimensional structure. Based on the factor analyses and other information 1 index was chosen to represent each of the 5 dimensions and these selections were confirmed with a principal components analysis. The paper recommends adopting these indices as standard indicators in future studies of segregation. (authors)

2,833 citations

Journal ArticleDOI
01 Nov 1954
TL;DR: In this article, the authors considered the problem of determining whether statistics given for each "county" in a "country" are distributed at random or whether they form a pattern.
Abstract: The problem discussed in this paper is to determine whether statistics given for each "county" in a "country" are distributed at random or whether they form a pattern. The statistical instrument is the contiguity ratio c defined by formula (1.1) below, which is an obvious generalization of the Von Neumann (1941) ratio used in one-dimensional analysis, particularly time series. While the applications in the paper are confined to oneand two-dimensional problems, it is evident that the theory applies to any number of dimensions. If the figures for adjoining counties are generally closer than those for counties not adjoining, the ratio will clearly tend to be less than unity. The constants are such that when the statistics are distributed at random in the counties, the average value of the ratio is unity. The statistics will be regarded as contiguous if the actual ratio found is significantly less than unity, by reference to the standard error. The theory is discussed from the viewpoints of both randomization and classical normal theory. With the randomization approach, the observations themselves are the "universe" and no assumption need be made as to the character of the frequency distribution. In the "normal case," the assumption is that the observations may be regarded as a random sample from a normal universe. In this case it seems certain that the ratio tends very rapidly to normality as the number of counties increases. The exact values of the first four semi-invariants are given for the normal case. These functions depend only on the configuration, and the calculated values for Ireland, with number of counties only 26, show that the distribution of the ratio is very close to normal. Accordingly, one can have confidence in deciding on significance from the standard error.

1,409 citations

Journal ArticleDOI

717 citations