scispace - formally typeset
Open AccessBook ChapterDOI

The quality of geospatial context

Reads0
Chats0
TLDR
Over the past two decades substantial progress has been made on the theory and methods of geospatial uncertainty, but hard problems remain in several areas, including uncertainty visualization and propagation.
Abstract
The location of an event or feature on the Earth's surface can be used to discover information about the location's surrroundings, and to gain insights into the conditions and processes that may affect or even cause the presence of the event or feature. Such reasoning lies at the heart of critical spatial thinking, and is increasingly implemented in tools such as geographic information systems and online Web mashups. But the quality of contextual information relies on accurate positions and descriptions. Over the past two decades substantial progress has been made on the theory and methods of geospatial uncertainty, but hard problems remain in several areas, including uncertainty visualization and propagation. Web 2.0 mechanisms are fostering the rapid growth of user-generated geospatial content, but raising issues of associated quality.

read more

Content maybe subject to copyright    Report

The Quality of Geospatial Context
Michael F. Goodchild
1
1
Center for Spatial Studies, and Department of Geography, University of California, Santa
Barbara, CA 93106-4060, USA
good@geog.ucsb.edu
Abstract. The location of an event or feature on the Earth's surface can be used
to discover information about the location's surrroundings, and to gain insights
into the conditions and processes that may affect or even cause the presence of
the event or feature. Such reasoning lies at the heart of critical spatial thinking,
and is increasingly implemented in tools such as geographic information
systems and online Web mashups. But the quality of contextual information
relies on accurate positions and descriptions. Over the past two decades
substantial progress has been made on the theory and methods of geospatial
uncertainty, but hard problems remain in several areas, including uncertainty
visualization and propagation. Web 2.0 mechanisms are fostering the rapid
growth of user-generated geospatial content, but raising issues of associated
quality.
Keywords: geospatial data, context, uncertainty, error, Web 2.0
1 Introduction
Over the past several decades there has been rapid and accelerating progress in the
availability, acquisition, and use of geospatial data, that is, data that associate places
on or near the Earth’s surface x, the attributes observed at those places z(x), and in
some cases the time of observation t. Progress can be seen in the development of GPS
(the Global Positioning System), which for the first time allowed rapid, accurate, and
direct determination of location; remote sensing, providing massive quantities of
image data at spatial resolutions as fine as 50cm; geographic information systems
(GIS) and spatial databases to represent, analyze, and reason from geospatial data;
and a host of Web applications for synthesizing, disseminating, and sharing data.
The purpose of this paper is to examine issues of quality when context is
constructed from geospatial data. The next section provides some background,
including a brief review of research on geospatial data quality and a summary of its
major findings. Section 3 examines the broader concept of context, drawing from
work in spatial analysis, the social sciences, and GIS. Section 4 discusses the key
issues of data integration, with particular emphasis on spatial joins and mashups.
Section 5 examines the growing contributions of user-generated content, and the
quality issues that are emerging in this context. The paper ends with a brief
concluding section.

2 Michael F. Goodchild
2 Background
Early developments in GIS, and the automation of map-making processes, allowed
information from maps to be converted to precise digital records. But paper maps are
analog representations, and map-making is as much an art as a science, and it follows
that data derived from maps do not necessarily stand up to the rigor and precision of
digital manipulation, especially for scientific purposes. As early as the mid 1980s it
had become apparent that the quality of geospatial data and the impact of quality on
applications were significant and largely unexplored issues. A workshop in 1988
brought together the small community of researchers working on the problem, and led
to a first book [1]. Two international biennial conference series were established in
the 1990s (the 6
th
International Symposium on Spatial Data Quality meets at
Memorial University, Canada, July 6–8 2009; and the 9
th
International Symposium on
Spatial Accuracy Assessment in Natural Resources and Environmental Sciences
meets at the University of Leicester, UK, July 20–23, 2010).
It quickly became apparent that the problem was much more than one of
measurement error. The attributes associated with locations by ecologists,
pedologists, foresters, urban planners, and many other scholarly and practitioner
communities are frequently vague, with definitions that fail to meet scientific
standards of replicability (asked to make independent maps of selected properties of
an area, two professionals will not in general produce identical maps). Statistical
approaches to error analysis were supplemented by research into fuzzy and rough sets,
the theory of evidence, and subjective probability.
Today the field of geospatial uncertainty can be seen as addressing four related
problems:
sources of uncertainty, and approaches to uncertainty management and
minimization;
modeling of uncertainty for various types of geospatial data, using statistical and
other frameworks;
visualization and communication of uncertainty; and
propagation of uncertainty during processes of analysis and reasoning.
Notable surveys of the field include those by Devillers and Jeansoulin [2], Foody
and Atkinson [3], Goodchild and Jeansoulin [4], Guptill and Morrison [5], Heuvelink
[6], Lowell and Jaton [7], Mowrer and Congalton [8], Shi, Fisher, and Goodchild [9],
Stein, Shi, and Bijker [10], and Zhang and Goodchild [11].
Several key findings from this work can be identified. First, uncertainty should be
defined as the degree to which a spatial database leaves a given user uncertain about
the actual nature of the real world. This uncertainty may result from inaccurate
measurement, vagueness of definition, generalization or loss of detail in digital
representation, lack of adequate documentation, and many other sources. Second,
uncertainty is endemic in all geospatial data. Third, the importance of uncertainty is
application-specific, and may be insignificant for some applications; but it will always
be possible to find at least one application for which the uncertainty of a given item of
information is significant.

The Quality of Geospatial Context 3
Measurement of geospatial position is never perfect, and may introduce uncertainty
into the topological properties that can be derived from positions. For example, a
point lying near the boundary of an area may appear to be outside the area if either its
location, or the location of the boundary, or both are sufficiently uncertain. Similarly
it may be impossible to determine accurately whether a house is on one side of a
street or the other, because of uncertainties in the positions of both. Thus an important
principle of GIS practice is that it may be necessary to allow topology to trump
geometry, in other words to allow coded topological properties to override geometric
appearances.
While the problem of uncertainty in geospatial data is in many ways analogous to
problems of uncertainty in other data types, one key property leads to numerous
fundamental problems. This is the property known as spatial dependence. Many types
of errors in geospatial data tend to be positively autocorrelated; that is, errors of
position x or attribute z(x) tend to be similar over short distances. For example,
suppose elevation has been measured at a series of points, with a standard error of 5m,
and these elevations have been compiled into a digital elevation model (DEM) with a
horizontal spacing between data points of 30m. A common application for such data
is the estimation of slope. Clearly such estimates would be highly suspect if based on
elevations with standard errors of 5m, if errors were statistically independent. In
reality, however, methods of DEM compilation tend to create errors that are highly
correlated over short distances. Thus it is still possible to obtain accurate estimates of
slope despite substantial elevation errors.
A similar argument can be made for many applications of geospatial data.
Databases of streets are useful for navigation purposes even though absolute positions
may be in error by tens of meters, since relative positions are much more accurate.
The area of land parcels can be estimated to fractions of a sq m even though their
absolute positions may be in error by meters. Spatial dependence is the basis for the
fields of geostatistics [12] and spatial statistics [13], both of which address the
analysis and mining of spatially autocorrelated data. Informally the principle is known
as Tobler’s First Law of Geography [14]: “nearby things are more similar than distant
things”.
Several implications of the widespread presence of spatial dependence are worthy
of mention. First, data that share lineage are likely to have spatially dependent error
structures, and consequently relative errors of positions and attributes will almost
always be less than absolute errors. In statistical terms relative error is a joint
property of pairs of locations, whereas absolute error is a marginal property of
locations taken one at a time. Second, when data from independent sources are
brought together, with no sharing of lineage, relative errors will be as large as
absolute errors. We return to this point later in the discussion of spatial joins and
mashups.
The third implication concerns visualization. A map is a very effective mechanism
for displaying the properties z associated with locations x, particularly when those
properties are static. Measures of quality associated with locations, such as the
marginal standard error of elevation discussed in a previous example, can also be
displayed in this way. But the key issue of spatial dependence is problematic, since a
map offers no way of showing the joint properties of locations, and thus no way of
communicating to the user the important difference between correlated and

4 Michael F. Goodchild
uncorrelated errors. One solution, explored at length by Ehlschlaeger [15] and others,
is to animate the map. For example, correlated errors of elevation will appear as a
simultaneous rising and falling of neighboring points, like a waving blanket.
Finally, spatial dependence has implications for the data models used to represent
geospatial data. Goodchild [16] has shown that the traditional model used to represent
area-class maps (maps that partition an area into irregular patches of uniform class)
cannot be adapted by adding appropriate attributes representing uncertainty to its
various tables of nodes, edges, and faces; instead, an entirely new raster-based model
must be adopted. Similarly, Goodchild [17] has argued that the traditional coordinate-
based structure of GIS must be replaced by a radically different measurement-based
structure to capture uncertainty in the measurement of positions.
3 Defining context
Context can be defined as information about the surroundings of events, features, and
transactions, and in the geospatial context of this paper surroundings can be taken to
mean a geographic area. A host of terms have similar meaning, and in some cases
those meanings have been formalized. Some of those terms and formalizations will be
reviewed in this section.
Place has the sense of an area of the Earth’s surface that possesses some form of
identity, and perhaps homogeneity with respect to certain characteristics. Some places
are officially recognized and formalized, such as the populated places recognized by
the Bureau of the Census or the named places recognized by the Board on Geographic
Names. Formalization often means the identification of a boundary, and often its
digital representation as a polygon of vertices and straight connecting segments. A
gazetteer is a relation between places, their locations, and their types [18], and the
largest digital gazetteers currently contain on the order of 10
7
officially recognized
places. Hastings [19] has discussed issues of geometric (locations), nominal (names),
and taxial (types) interoperability among digital gazetteers. Other places have identity
to humans, but no official recognition. Montello [20] discussed the place “downtown
Santa Barbara”, the elicitation of its geographic limits from human subjects, and the
alternative representations and visualizations that would result from its formalization.
Community and neighborhood convey more of a sense of belonging. A resident at
some location x would have some concept of belonging to an area A(x) surrounding
x, and one would expect the neighborhoods of nearby residents to overlap
substantially. In the extreme, one might expect a city to be partitioned into bounded
and non-overlapping neighborhoods, such that all residents in a neighborhood
perceive their areas A as identical. Increasingly, however, access to the Internet is
creating communities that lack such simple geographic expression.
The action space of an individual is defined as the geographic area habitually
occupied by the individual, including place of residence, workplace, and locations of
shopping and recreation. Action space is clearly related to concepts of community and
neighborhood, though many people would not identify workplace as part of
neighborhood.

The Quality of Geospatial Context 5
The idea that neighborhoods can be modeled as partitions lies behind the approach
that many researchers have taken to unravelling connections between individuals and
neighborhoods. For example, Lopez [21] has studied the impact of neighborhood on
obesity, arguing that a resident’s context determines his or her level of physical
activity. Because of the difficulty of determining A(x) for every individual,
researchers often assume that context is provided by the properties of some
convenient statistical reporting zone containing x, such as a county, census tract, or
block. Similar approaches have been used in studies of the effects of air pollution on
health. In such cases context is easily accessible, but with obvious consequences of
misrepresentation. Statistical reporting zones are rarely designed to coincide with
anyone’s sense of neighborhood, and the notion that all residents of a zone perceive
the same zone as their neighborhood has little if any empirical support.
Geographers have long been interested in the partitioning of geographic space
using formal criteria. A partition into formal regions is defined by minimizing within-
region variation, while a partition into functional regions is defined as maximizing
within-region interaction and minimizing between-region interaction, where
interaction might be defined by patterns of trade, commuting, or social networking. In
both cases the number of regions, and hence the average size of regions, must be
determined independently.
Cova and Goodchild [22] have addressed the digital representation of A(x) when it
is unique to x. They define an object-field as a mapping of location x to area A(x), and
identify several other applications. In general this approach would be applicable to
any problem in which context is unique to location.
More generally one might express context in terms of a convolution function. The
context of a location x might be modeled as the aggregate effect of the properties of
the surroundings, weighted by a function of distance w to allow nearby surroundings
to contribute more than distant surroundings. If the surroundings consist of a set y
i
,
characterized by some relevant attribute z
i
, then context C(x) might be defined as:
(
)
)()(
=
i
i
ii
wwzC yxyxx
(1)
This approach has obvious advantages over the quick-and-dirty methods discussed
previously. Rather than equating context with some independently defined reporting
zone, it allows context to be defined explicitly through the function w, based on the
spatial variation of some property z. Of the possible distance functions, the negative
exponential has the advantage of being supported by extensive theory, showing that it
is the most likely option in the absence of other information [23]. Negative powers
have the disadvantage of w(0) being undefined. In both cases however the weighting
function will have a parameter, representing neighborhood scale, that must be
established independently.
4 Data integration
In geospatial technologies, location provides the common key to integrate data. In
practice, however, there are substantial difficulties in doing so [24]. Lack of

Citations
More filters
Journal ArticleDOI

Researching Volunteered Geographic Information: Spatial Data, Geographic Research, and New Social Practice

TL;DR: A recent survey of volunteer geographic information (VGI) for geography and geographers can be found in this article with an eye toward identifying its potential in our field, as well as the most pressing research needed to realize this potential.
Journal ArticleDOI

Horizontal accuracy assessment of very high resolution Google Earth images in the city of Rome, Italy

TL;DR: The results show that GE VHR imageries of Rome have an overall positional accuracy close to 1 m, sufficient for deriving ground truth samples, measurements, and large-scale planimetric maps.
Journal ArticleDOI

Towards credibility of micro-blogs: characterising witness accounts

TL;DR: In this article, a conceptual model of a witness account and related impact accounts and relayed accounts is developed, and influence regions defining a relationship between witnesses and events are inferred, from different categories of witness accounts.
References
More filters
Book

Statistics for spatial data

TL;DR: In this paper, the authors present a survey of statistics for spatial data in the field of geostatistics, including spatial point patterns and point patterns modeling objects, using Lattice Data and spatial models on lattices.
Journal ArticleDOI

5. Statistics for Spatial Data

TL;DR: Cressie et al. as discussed by the authors presented the Statistics for Spatial Data (SDS) for the first time in 1991, and used it for the purpose of statistical analysis of spatial data.
Book

Geographic Information Systems and Science

TL;DR: The Third Edition of this bestselling textbook has been fully revised and updated to include the latest developments in the field and still retains its accessible format to appeal to a broad range of students.
Journal ArticleDOI

Entropy in Urban and Regional Modelling

Related Papers (5)
Frequently Asked Questions (9)
Q1. What contributions have the authors mentioned in the paper "The quality of geospatial context" ?

In this paper, the authors examine issues of quality when context is constructed from geospatial data, and examine the growing contributions of user-generated content, and the quality issues emerging in this context. 

Measures of quality associated with locations, such as the marginal standard error of elevation discussed in a previous example, can also be displayed in this way. 

Context can be defined as information about the surroundings of events, features, and transactions, and in the geospatial context of this paper surroundings can be taken to mean a geographic area. 

Databases of streets are useful for navigation purposes even though absolute positions may be in error by tens of meters, since relative positions are much more accurate. 

Action space is clearly related to concepts of community and neighborhood, though many people would not identify workplace as part of neighborhood. 

Spatial dependence is the basis for the fields of geostatistics [12] and spatial statistics [13], both of which address the analysis and mining of spatially autocorrelated data. 

As early as the mid 1980s it had become apparent that the quality of geospatial data and the impact of quality on applications were significant and largely unexplored issues. 

The idea that neighborhoods can be modeled as partitions lies behind the approach that many researchers have taken to unravelling connections between individuals and neighborhoods. 

A partition into formal regions is defined by minimizing withinregion variation, while a partition into functional regions is defined as maximizing within-region interaction and minimizing between-region interaction, where interaction might be defined by patterns of trade, commuting, or social networking.