What contributions have the authors mentioned in the paper "The quality of geospatial context" ?

In this paper, the authors examine issues of quality when context is constructed from geospatial data, and examine the growing contributions of user-generated content, and the quality issues emerging in this context.

(Open Access) The quality of geospatial context (2009) | Michael F. Goodchild

The Quality of Geospatial Context

Michael F. Goodchild

Center for Spatial Studies, and Department of Geography, University of California, Santa

Barbara, CA 93106-4060, USA

good@geog.ucsb.edu

Abstract. The location of an event or feature on the Earth's surface can be used

to discover information about the location's surrroundings, and to gain insights

into the conditions and processes that may affect or even cause the presence of

the event or feature. Such reasoning lies at the heart of critical spatial thinking,

and is increasingly implemented in tools such as geographic information

systems and online Web mashups. But the quality of contextual information

relies on accurate positions and descriptions. Over the past two decades

substantial progress has been made on the theory and methods of geospatial

uncertainty, but hard problems remain in several areas, including uncertainty

visualization and propagation. Web 2.0 mechanisms are fostering the rapid

growth of user-generated geospatial content, but raising issues of associated

quality.

Keywords: geospatial data, context, uncertainty, error, Web 2.0

1 Introduction

Over the past several decades there has been rapid and accelerating progress in the

availability, acquisition, and use of geospatial data, that is, data that associate places

on or near the Earth’s surface x, the attributes observed at those places z(x), and in

some cases the time of observation t. Progress can be seen in the development of GPS

(the Global Positioning System), which for the first time allowed rapid, accurate, and

direct determination of location; remote sensing, providing massive quantities of

image data at spatial resolutions as fine as 50cm; geographic information systems

(GIS) and spatial databases to represent, analyze, and reason from geospatial data;

and a host of Web applications for synthesizing, disseminating, and sharing data.

The purpose of this paper is to examine issues of quality when context is

constructed from geospatial data. The next section provides some background,

including a brief review of research on geospatial data quality and a summary of its

major findings. Section 3 examines the broader concept of context, drawing from

work in spatial analysis, the social sciences, and GIS. Section 4 discusses the key

issues of data integration, with particular emphasis on spatial joins and mashups.

Section 5 examines the growing contributions of user-generated content, and the

quality issues that are emerging in this context. The paper ends with a brief

concluding section.

2 Michael F. Goodchild

2 Background

Early developments in GIS, and the automation of map-making processes, allowed

information from maps to be converted to precise digital records. But paper maps are

analog representations, and map-making is as much an art as a science, and it follows

that data derived from maps do not necessarily stand up to the rigor and precision of

digital manipulation, especially for scientific purposes. As early as the mid 1980s it

had become apparent that the quality of geospatial data and the impact of quality on

applications were significant and largely unexplored issues. A workshop in 1988

brought together the small community of researchers working on the problem, and led

to a first book [1]. Two international biennial conference series were established in

the 1990s (the 6

International Symposium on Spatial Data Quality meets at

Memorial University, Canada, July 6–8 2009; and the 9

International Symposium on

Spatial Accuracy Assessment in Natural Resources and Environmental Sciences

meets at the University of Leicester, UK, July 20–23, 2010).

It quickly became apparent that the problem was much more than one of

measurement error. The attributes associated with locations by ecologists,

pedologists, foresters, urban planners, and many other scholarly and practitioner

communities are frequently vague, with definitions that fail to meet scientific

standards of replicability (asked to make independent maps of selected properties of

an area, two professionals will not in general produce identical maps). Statistical

approaches to error analysis were supplemented by research into fuzzy and rough sets,

the theory of evidence, and subjective probability.

Today the field of geospatial uncertainty can be seen as addressing four related

problems:

• sources of uncertainty, and approaches to uncertainty management and

minimization;

• modeling of uncertainty for various types of geospatial data, using statistical and

other frameworks;

• visualization and communication of uncertainty; and

• propagation of uncertainty during processes of analysis and reasoning.

Notable surveys of the field include those by Devillers and Jeansoulin [2], Foody

and Atkinson [3], Goodchild and Jeansoulin [4], Guptill and Morrison [5], Heuvelink

[6], Lowell and Jaton [7], Mowrer and Congalton [8], Shi, Fisher, and Goodchild [9],

Stein, Shi, and Bijker [10], and Zhang and Goodchild [11].

Several key findings from this work can be identified. First, uncertainty should be

defined as the degree to which a spatial database leaves a given user uncertain about

the actual nature of the real world. This uncertainty may result from inaccurate

measurement, vagueness of definition, generalization or loss of detail in digital

representation, lack of adequate documentation, and many other sources. Second,

uncertainty is endemic in all geospatial data. Third, the importance of uncertainty is

application-specific, and may be insignificant for some applications; but it will always

be possible to find at least one application for which the uncertainty of a given item of

information is significant.

The Quality of Geospatial Context 3

Measurement of geospatial position is never perfect, and may introduce uncertainty

into the topological properties that can be derived from positions. For example, a

point lying near the boundary of an area may appear to be outside the area if either its

location, or the location of the boundary, or both are sufficiently uncertain. Similarly

it may be impossible to determine accurately whether a house is on one side of a

street or the other, because of uncertainties in the positions of both. Thus an important

principle of GIS practice is that it may be necessary to allow topology to trump

geometry, in other words to allow coded topological properties to override geometric

appearances.

While the problem of uncertainty in geospatial data is in many ways analogous to

problems of uncertainty in other data types, one key property leads to numerous

fundamental problems. This is the property known as spatial dependence. Many types

of errors in geospatial data tend to be positively autocorrelated; that is, errors of

position x or attribute z(x) tend to be similar over short distances. For example,

suppose elevation has been measured at a series of points, with a standard error of 5m,

and these elevations have been compiled into a digital elevation model (DEM) with a

horizontal spacing between data points of 30m. A common application for such data

is the estimation of slope. Clearly such estimates would be highly suspect if based on

elevations with standard errors of 5m, if errors were statistically independent. In

reality, however, methods of DEM compilation tend to create errors that are highly

correlated over short distances. Thus it is still possible to obtain accurate estimates of

slope despite substantial elevation errors.

A similar argument can be made for many applications of geospatial data.

Databases of streets are useful for navigation purposes even though absolute positions

may be in error by tens of meters, since relative positions are much more accurate.

The area of land parcels can be estimated to fractions of a sq m even though their

absolute positions may be in error by meters. Spatial dependence is the basis for the

fields of geostatistics [12] and spatial statistics [13], both of which address the

analysis and mining of spatially autocorrelated data. Informally the principle is known

as Tobler’s First Law of Geography [14]: “nearby things are more similar than distant

things”.

Several implications of the widespread presence of spatial dependence are worthy

of mention. First, data that share lineage are likely to have spatially dependent error

structures, and consequently relative errors of positions and attributes will almost

always be less than absolute errors. In statistical terms relative error is a joint

property of pairs of locations, whereas absolute error is a marginal property of

locations taken one at a time. Second, when data from independent sources are

brought together, with no sharing of lineage, relative errors will be as large as

absolute errors. We return to this point later in the discussion of spatial joins and

mashups.

The third implication concerns visualization. A map is a very effective mechanism

for displaying the properties z associated with locations x, particularly when those

properties are static. Measures of quality associated with locations, such as the

marginal standard error of elevation discussed in a previous example, can also be

displayed in this way. But the key issue of spatial dependence is problematic, since a

map offers no way of showing the joint properties of locations, and thus no way of

communicating to the user the important difference between correlated and

4 Michael F. Goodchild

uncorrelated errors. One solution, explored at length by Ehlschlaeger [15] and others,

is to animate the map. For example, correlated errors of elevation will appear as a

simultaneous rising and falling of neighboring points, like a waving blanket.

Finally, spatial dependence has implications for the data models used to represent

geospatial data. Goodchild [16] has shown that the traditional model used to represent

area-class maps (maps that partition an area into irregular patches of uniform class)

cannot be adapted by adding appropriate attributes representing uncertainty to its

various tables of nodes, edges, and faces; instead, an entirely new raster-based model

must be adopted. Similarly, Goodchild [17] has argued that the traditional coordinate-

based structure of GIS must be replaced by a radically different measurement-based

structure to capture uncertainty in the measurement of positions.

3 Defining context

Context can be defined as information about the surroundings of events, features, and

transactions, and in the geospatial context of this paper surroundings can be taken to

mean a geographic area. A host of terms have similar meaning, and in some cases

those meanings have been formalized. Some of those terms and formalizations will be

reviewed in this section.

Place has the sense of an area of the Earth’s surface that possesses some form of

identity, and perhaps homogeneity with respect to certain characteristics. Some places

are officially recognized and formalized, such as the populated places recognized by

the Bureau of the Census or the named places recognized by the Board on Geographic

Names. Formalization often means the identification of a boundary, and often its

digital representation as a polygon of vertices and straight connecting segments. A

gazetteer is a relation between places, their locations, and their types [18], and the

largest digital gazetteers currently contain on the order of 10

officially recognized

places. Hastings [19] has discussed issues of geometric (locations), nominal (names),

and taxial (types) interoperability among digital gazetteers. Other places have identity

to humans, but no official recognition. Montello [20] discussed the place “downtown

Santa Barbara”, the elicitation of its geographic limits from human subjects, and the

alternative representations and visualizations that would result from its formalization.

Community and neighborhood convey more of a sense of belonging. A resident at

some location x would have some concept of belonging to an area A(x) surrounding

x, and one would expect the neighborhoods of nearby residents to overlap

substantially. In the extreme, one might expect a city to be partitioned into bounded

and non-overlapping neighborhoods, such that all residents in a neighborhood

perceive their areas A as identical. Increasingly, however, access to the Internet is

creating communities that lack such simple geographic expression.

The action space of an individual is defined as the geographic area habitually

occupied by the individual, including place of residence, workplace, and locations of

shopping and recreation. Action space is clearly related to concepts of community and

neighborhood, though many people would not identify workplace as part of

neighborhood.

The Quality of Geospatial Context 5

The idea that neighborhoods can be modeled as partitions lies behind the approach

that many researchers have taken to unravelling connections between individuals and

neighborhoods. For example, Lopez [21] has studied the impact of neighborhood on

obesity, arguing that a resident’s context determines his or her level of physical

activity. Because of the difficulty of determining A(x) for every individual,

researchers often assume that context is provided by the properties of some

convenient statistical reporting zone containing x, such as a county, census tract, or

block. Similar approaches have been used in studies of the effects of air pollution on

health. In such cases context is easily accessible, but with obvious consequences of

misrepresentation. Statistical reporting zones are rarely designed to coincide with

anyone’s sense of neighborhood, and the notion that all residents of a zone perceive

the same zone as their neighborhood has little if any empirical support.

Geographers have long been interested in the partitioning of geographic space

using formal criteria. A partition into formal regions is defined by minimizing within-

region variation, while a partition into functional regions is defined as maximizing

within-region interaction and minimizing between-region interaction, where

interaction might be defined by patterns of trade, commuting, or social networking. In

both cases the number of regions, and hence the average size of regions, must be

determined independently.

Cova and Goodchild [22] have addressed the digital representation of A(x) when it

is unique to x. They define an object-field as a mapping of location x to area A(x), and

identify several other applications. In general this approach would be applicable to

any problem in which context is unique to location.

More generally one might express context in terms of a convolution function. The

context of a location x might be modeled as the aggregate effect of the properties of

the surroundings, weighted by a function of distance w to allow nearby surroundings

to contribute more than distant surroundings. If the surroundings consist of a set y

characterized by some relevant attribute z

, then context C(x) might be defined as:

(

)

)()(

∑

−−=

wwzC yxyxx

(1)

This approach has obvious advantages over the quick-and-dirty methods discussed

previously. Rather than equating context with some independently defined reporting

zone, it allows context to be defined explicitly through the function w, based on the

spatial variation of some property z. Of the possible distance functions, the negative

exponential has the advantage of being supported by extensive theory, showing that it

is the most likely option in the absence of other information [23]. Negative powers

have the disadvantage of w(0) being undefined. In both cases however the weighting

function will have a parameter, representing neighborhood scale, that must be

established independently.

4 Data integration

In geospatial technologies, location provides the common key to integrate data. In

practice, however, there are substantial difficulties in doing so [24]. Lack of

The quality of geospatial context

Citations

Researching Volunteered Geographic Information: Spatial Data, Geographic Research, and New Social Practice

A review of methods, data, and models to assess changes in the value of ecosystem services from land degradation and restoration

Statistics for spatial data: by Noel Cressie, 1991, John Wiley & Sons, New York, 900 p., ISBN 0-471-84336-9, US $89.95

Horizontal accuracy assessment of very high resolution Google Earth images in the city of Rome, Italy

Towards credibility of micro-blogs: characterising witness accounts

References

Statistics for spatial data

5. Statistics for Spatial Data

Geographic Information Systems and Science

Entropy in Urban and Regional Modelling

Entropy in urban and regional modelling

Related Papers (5)

The Pervasive Challenge of Error and Uncertainty in Geospatial Data

Envisioning uncertainty in geospatial information

Trust Indicator For Decisions Based On Geospatial Data.

DCluster: Geospatial Analytics with PoI Identification

Use of Geographical Information Systems for Thailand

Frequently Asked Questions (9)

Q1. What contributions have the authors mentioned in the paper "The quality of geospatial context" ?

Q2. How can a map be used to display the properties of locations?

Q3. What can be taken to mean a geographic area?

Q4. What is the common application of data for navigation purposes?

Q5. What is the meaning of action space?

Q6. What is the basis for the fields of geostatistics and spatial statistics?

Q7. When did it become apparent that the quality of geospatial data was important?

Q8. What is the idea that neighborhoods can be modeled as partitions?

Q9. What is the difference between partitioning into formal and functional regions?