A Perimeter-Based Clustering Index for Measuring Spatial Segregation: A Cognitive GIS Approach:

doi:10.1068/B250327

From the SelectedWorks of Dennis P. Culhane

1998

A Perimeter-based Clustering Index for Measuring

Spatial Segregation: A Cognitive GIS Approach

Dennis P Culhane, University of Pennsylvania

Chang Moo Lee, University of Pennsylvania

Available at: h+ps://works.bepress.com/dennis_culhane/41/

Environment and Planning B: Planning and Design 1998, volume 25, pages

327

- 343

A perimeter-based clustering index for measuring spatial

segregation: a cognitive GIS approach

C-M Lee

Wharton Real Estate Center, University of Pennsylvania, 3600 Market Street, Philadelphia,

PA 19104-2648, USA; e-mail: leecm@wharton.upenn.edu

D P Culhane

School of Social Work, University of Pennsylvania, PA 19104-2648, USA;

e-mail: dennis@cmhpsr.upenn.edu

Received 25 November 1996; in revised form 2 May 1997

Abstract. Many efforts have been made to develop segregation indices that incorporate spatial

interaction based on the contiguity concept. Contiguity refers to how similar the concentration of

the subject of interest in one areal unit is to that in adjacent areal units. However, highly segregated

situations are typically considered to be isolated sections or enclaves rather than smoothly formed

peaks of concentration in space. Therefore, highly segregated enclaves may not exhibit contiguity. In

this paper, a new index to measure the degree of clustering is developed and it is compared with the

existing indices of concentration or segregation. The proposed clustering index (7

C

) tends to give more

weight to 'enclaveness' rather than contiguity alone. This may be a good property for those cases in

which the primary concern of an investigator is the formation of enclaves of a socioeconomic subject,

including minority populations, poverty, crime, epidemics, and mortgage red-lining. Additionally, its

property of robustness to the citywide rate allows us to perform properly an intercity comparison of a

given subject by index score even when the citywide rate varies significantly, unlike the other measures.

1 Introduction

Numerous efforts have been made to develop a proper index to measure the spatial

segregation of a population group.

(1)

Though each index characterizes somewhat different

aspects of a spatial distribution, one can distinguish two types of indices: measures

ignoring spatial interaction between areal units; and measures incorporating spatial

interaction.

The problem with measures of segregation that lack spatial interaction components,

including the dissimilarity index, the Gini coefficient, and the entropy index (Theil,

1972),

is well illustrated in the case of the 'checkerboard problem', described by White

(1983).

Several efforts have been made to develop segregation indices that incorporate

spatial interaction, including the index of spatial proximity (White, 1986) and the

distance-based index of dissimilarity (Morgan, 1982). In general, these measures include

spatial interaction by distance or binary adjacency between two areal units. Recently

Wong (1993) formulated a new segregation index, which uses the length of the common

boundary of two areas as an indicator of the degree of social interaction between the

residents of the two areas.

Spatial-interaction measures in geography, and segregation measures incorporating

spatial interaction in sociology are similar in concept. However, spatial-interaction

measures in geography are based only on distribution in physical space, whereas the

segregation measures take account in population distribution overlaid on physical

space along with the distribution of physical space

itself.

The spatial-interaction segrega-

tion indices in sociology are derived from Dacey (1968) and Geary (1954), where they

have been labeled 'contiguity' measures.

See Massey and Denton (1988) for the existing measures.

328

C-M Lee, D P Culhane

Contiguity refers to how similar the concentration of the subject of interest in one

areal unit is to that in adjacent areal units. If the figures for adjoining areal units are

generally closer than those for the areal units not adjoining, this condition yields a con-

tiguous distribution of the subject of interest (Dacey, 1968; Geary, 1954). This contiguity

aspect of spatial distribution has been well developed into a field of spatial statistics

known as spatial autocorrelation. In the last twenty years, a number of instruments for

testing for and measuring spatial autocorrelation have appeared (Anselin, 1988). To

geographers, the best-known statistics are Moran's /, and, to a lesser extent, Geary's c

(Cliff and Ord, 1973).

In some cases (Massey and Denton, 1988), the contiguity measures in geography

are interpreted as clustering indices, with some modifications. However, a high degree

of clustering does not always represent a high degree of contiguity. For example, one can

imagine a spatial distribution pattern in which one subject of interest forms isolated

enclaves which have visible boundaries. That distribution is not supposed to yield a

high degree of contiguity, as the difference at the boundaries of the enclaves reduces

the overall degree of contiguity. In the real world, one is generally concerned about

isolated enclaves of a population group which are recognized by both high concentra-

tion and separateness, rather than the spatial contiguity of their distribution alone. For

point data in the natural space, there are some measures which use nearest-neighbor

methods to describe the degree of clustering (Ripley, 1981). However, for areal data in

urban space, overlaid with population, one needs to have a different measure of

clustering rather than the existing contiguity measure of segregation.

In this paper, a new index to measure the degree of clustering is developed and then

compared with the existing indices of segregation. In section 2, clustering is defined in

an operational way, and in sections 3 and 4 a method for calculating the new clustering

index and its properties are discussed. In section 5, four existing indices to be compared

with the new clustering index are discussed briefly, and the clustering index and the four

other indices are compared in two hypothetical settings including binary distribution in

a regular lattice, and semicontiguous distribution in a regular lattice. In section 6, the

five indices are compared in a real-world application, the five boroughs of New York City.

2 Operational definition of clustering

When the spatial distribution of a subject of interest on a map is examined, viewers

tend to draw arbitrary boundaries of clusters and define a set of clusters cognitively,

whether it is a point distribution or an areal data distribution. This cognition could be

said to have three attributes: the total size of the clusters, their shape, and the closeness

between them. Here, a clustering index is derived based on these three attributes.

In order to draw the boundaries of clusters, an objective way to define clusters is

needed. In an urban setting, the probability of occurrence of a subject in an areal unit

depends on population rather than the size of the areal unit. For example, all other

factors being equal, the expected number of the poor in a census tract depends on the

number of people residing in the tract rather than the physical size of the tract. Once

the rate of an object group to population in each tract is determined, the next issue is

how to define concentration of the object group. One popular way of defining concen-

tration is the location quotient (Q

l

).

The location quotient is a device frequently used to identify specialization, concen-

tration, or the potential of an area for selected employment, industry, or output

indicators (Bendavid-Val, 1983; Chen, 1994). It refers to a ratio of the fractional share

of the subject of interest at the local level to the ratio at the regional

level.

When a local

Q

l

in a region is greater than 1, the locality has a higher concentration of the subject of

interest relative to the other localities of the region combined. For example, a census

A clustering index for measuring spatial segregation

329

(a) (b)

(d) (e)

^H More clustered

(f)

Less clustered

Figure 1. Size, shape, and adjacency of

subclusters:

(a) small

size,

p = 8; (b) large size, p = 16;

(c) regular shape, p = 10; (d) irregular shape, p = 14; (e) adjacent, p = 14; (f) separated,

p = 16.

tract or a block group may be equivalent to a locality, and a city to a region. Thus, Q

1

may be used to identify census tracts that contain a higher percentage share of a

subject of interest than a city as a whole, and which have a Q

x

value greater than 1.

Here, adjacent areal units showing a high concentration of the subject form a few

clusters on the map.

Once clusters are obtained, one needs to quantify the size, shape, and closeness of

the clusters. A measure that combines these three factors is the total perimeter of the

clusters. When shape and adjacency of the clusters are the same, the total perimeter (P)

of the clusters is a proper measure of the total size [see figures 1(a) and

1(b)].

When the

size and adjacency of the clusters are constant, circular shapes have the minimum

possible values [see figures 1(c) and

1(d)].

When the size and shape of the clusters are

the same, two adjoining clusters have a smaller total perimeter than two separated

clusters [see figures 1(e) and

1(f)].

Therefore, one can measure the degree of clustering

by assessing how small the total perimeter of the clusters (the concentrated areas of a

subject of interest) is, where the concentrated areas are selected by Q

1

.

3 Calculation of a clustering index

Based on the operational definition of clustering, one can develop a clustering index.

In order to illustrate the process for calculating the clustering index, a hypothetical city

space is assumed as in figure 2(a) (see over), where P is the population of each census

tract, and x

is

the number of an object group in the tract. As a first step, every census tract

is divided into two groups: highly concentrated census tracts, and less concentrated

census tracts based on Q

l

[see figure

2(b)].

When two highly concentrated tracts are

adjacent to each other, the common boundary lines are deleted and the two polygons

of the tracts are merged to form one polygon [see figure

2(c)].

This merging process

continues and finally a few polygons result, which represent highly concentrated areas or

clusters [see figure

2(d)].

The more adjacent the highly concentrated tracts are, the more

common boundaries are erased, and the smaller the ratio of the sum of the perimeters

330

C-M Lee, D P Culhane

! (90

(30,

4J

(50,

3;

6)

(60

f(20

U40

(110

9)

. 1)

6)

13)

(90

^(20

4)

3)

j

:4/3

J3/5

6/9

9/4

PM

2

13/1

3/b

6/4

1

4/9

Total perimeter: 33 (61

—

28)

(excluding the boundary of the study area)

Total population: 490

Total object group: 49

The numbers in parentheses are

(P,

x)

(a)

(b)

f

(c)

Total perimeter of merged polygons: 23

(excluding the boundary of the study area)

Clustering index =

1

- 23/33 = 0.30

(d)

Figure 2. The merging process used to calculate the clustering index: (a) population and object

group; (b) calculating the location quotient; (c) identifying concentrated tracts; (d) merging con-

centrated tracts.

of the merged polygons to the sum of the perimeters of the original tracts will be. In our

case,

the boundaries of the study area are not included in the calculation.

In this concept, the clustering index can be denoted as follows:

r = 1

EEIWA

where // is a binary value for tract / (1 if Q

l

^ 1; 0 if Q

l

< 1); and b

tj

is the length of

the common boundary between census tracts i and j (0 if tracts i and j are not

connected or / = j). If a pair of adjacent tracts have the same / value (either 1 or 0),

|/

;

- lj\ becomes 0, and their common boundary b

tj

does not count in the numerator in

the equation. Only the boundary between a pair of adjacent tracts which have different

/ values (high concentration versus low concentration) remains.

One advantage of this measure for an irregular polygon layout is that the degree of

proximity between polygons is automatically taken into account during the merging

A Perimeter-Based Clustering Index for Measuring Spatial Segregation: A Cognitive GIS Approach:

Citations

The Scale of Dissimilarity: Concepts, Measurement and an Application to Socio‐Economic Variation Across England and Wales

Multi-Contextual Segregation and Environmental Justice Research: Toward Fine-Scale Spatiotemporal Approaches.

Time-integrative geographic information systems

Sociospatial knowledge networks: Appraising community as place

Investigating Fine-Scale Residential Segregation By Means of Local Spatial Statistics

References

Spatial Econometrics: Methods and Models

Spatial statistics

The Dimensions of Residential Segregation

The Contiguity Ratio and Statistical Mapping

Statistical Decomposition Analysis

Related Papers (5)

The Dimensions of Residential Segregation

A methodological analysis of segregation indexes

Formulating a General Spatial Segregation Measure

From Aspatial to Spatial, from Global to Local and Individual: Are We on the Right Track to Spatialize Segregation Measures?

An open-source framework for non-spatial and spatial segregation measures: the PySAL segregation module