A review of volunteered geographic information quality assessment methods

doi:10.1080/13658816.2016.1189556

A review of volunteered geographic information quality

assessment methods

Hansi Senaratne

a

, Amin Mobasheri

b

, Ahmed Loai Ali

c,d

, Cristina Capineri

e

and Mordechai (Muki) Haklay

f

a

Data Analysis and Visualization Group, University of Konstanz, Konstanz, Germany;

b

GIScience Research

Group, Heidelberg University, Heidelberg, Germany;

c

Bremen Spatial Cognition Center, University of

Bremen, Bremen, Germany;

d

Information System Department, Assiut University, Assiut, Egypt;

e

Faculty of

Political Sciences, University of Sienna, Sienna, Italy;

f

Department of Geomatic Engineering, University

College London, London, UK

ABSTRACT

With the ubiquity of advanced web technologies and location-

sensing hand held devices, citizens regardless of their knowledge

or expertise, are able to produce spatial information. This phenom-

enon is known as volunteered geographic information (VGI).

During the past decade VGI has been used as a data source

supporting a wide range of services, such as environmental mon-

itoring, events reporting, human movement analysis, disaster

management, etc. However, these volunteer-contributed data

also come with varying quality. Reasons for this are: data is pro-

duced by heterogeneous contributors, using various technologies

and tools, having diﬀerent level of details and precision, serving

heterogeneous purposes, and a lack of gatekeepers. Crowd-sour-

cing, social, and geographic approaches have been proposed and

later followed to develop appropriate methods to assess the

quality measures and indicators of VGI. In this article, we review

various quality measures and indicators for selected types of VGI

and existing quality assessment methods. As an outcome, the

article presents a classiﬁcation of VGI with current methods uti-

lized to assess the quality of selected types of VGI. Through these

ﬁndings, we introduce data mining as an additional approach for

quality handling in VGI.

1. Introduction

Volunteered geographic information (VGI) is where citizens, often untrained, and regard-

less of their expertise and background create geographic information on dedicated web

platforms (Goodchild 2007), e.g., OpenStreetMap (OSM),

1

Wikimapia,

2

Google MyMaps,

3

Map Insight

4

and Flickr.

5

In a typology of VGI, the works of Antoniou et al.(2010) and

Craglia et al.(2012) classiﬁed VGI based on the type of explicit/implicit geography being

captured and the type of explicit/implicit volunteering. In explicit-VGI, contributors are

mainly focused on mapping activities. Thus, the contributor explicitly annotates the data

with geographic contents (e.g., geometries in OSM, Wikimapia, or Google). Data that is

Konstanzer Online-Publikations-System (KOPS)

URL: http://nbn-resolving.de/urn:nbn:de:bsz:352-0-341149

Erschienen in: International Journal of Geographical Information Science ; 31 (2017), 1. - S. 139-167

https://dx.doi.org/10.1080/13658816.2016.1189556

implicitly associated with a geographic location could be any kind of media: text, image,

or video referring to or associated with a speciﬁc geographic location. For example,

geotagged microblogs (e.g., Tweets), geotagged images from Flicker, or Wikipedia

articles that refer to geographic locations. Craglia et al.(2012) further elaborated that

for each type of implicit/explicit geography and volunteering, there are potentially

diﬀerent approaches for assessing the quality.

Due to the increased potential and use of VGI (as demonstrated in the works of Liu

et al. 2008, Jacob et al. 2009, McDougall 2009, Bulearca and Bulearca 2010, Sakaki et al.

2010, MacEachren et al. 2011, Chunara et al. 2012, Fuchs et al. 2013), it becomes

increasingly important to be aware of the quality of VGI, in order to derive accurate

information and decisions. Due to a lack of standardization, quality in VGI has shown to

vary across heterogeneous data sources (text, image, maps, etc.). For example, as seen in

Figure 1, a photograph of the famous tourist site the Brandenburg Gate in Berlin is

incorrectly geotagged in Jakarta, Indonesia on the photo-sharing platform Flickr. On the

other hand, OSM has also shown heterogeneity in coverage between diﬀerent places

(Haklay 2010). These trigger a variable quality in VGI. This can be explained by the fact

that humans perceive and express geographic regions and spatial relations imprecisely,

and in terms of vague concepts (Montello et al. 2003). This vagueness in human

conceptualization of location is due not only to the fact that geographic entities are

continuous in nature, but also due to the quality and limitations of spatial knowledge

(Hollenstein and Purves 2014).

Figure 1. A photograph of the Brandenburg Gate in Berlin is incorrectly geotagged in Jakarta,

Indonesia on the popular photo-sharing platform Flickr.

140

Providing reliable services or extraction of useful information require data with a

ﬁtness-for-use quality standard. Incorrect (as seen in Figure 1) or malicious geographic

annotations could be minimized in place of appropriate quality indicators and measures

for these various VGI contributions.

Goodchild and Li (2012) have discussed three approaches for assuring the quality of

VGI: crowd-sourcing (the involvement of a group to validate and correct errors that have

been made by an individual contributor), social approaches (trusted individuals who

have made themselves a good reputation with their contributions to VGI can, for

example, act as gatekeepers to maintain and control the quality of other VGI contribu-

tions), and geographic approaches (use of laws and knowledge from geography, such as

Tobler’s ﬁrst law to assess the quality). Many works have developed methods to assess

the quality of VGI based on these approaches.

In this article, we present an extensive review of the existing methods in the state-of-

the-art to assess the quality of map-, image-, and text-based VGI. As an outcome of the

review, we identify data mining as one more stand-alone approach to assess VGI quality

by utilizing computational processes for discovering patterns and learning purely from

data, irrespective of the laws and knowledge from geography, and independent from

social or crowd-sourced approaches. Extending the spectrum of approaches will sprout

more quality assessment methods in the future, especially for VGI types that have not

been extensively researched so far. To the best of our knowledge, surveys on existing

methods have not been done so far. This review provides an overview of methods that

have been built based on theories and discussions in the literature. Furthermore, this

survey gives the reader a glimpse to the practical applicability of all identiﬁed

approaches. The remainder of this article unfolds as follows. In Section 2, we describe

the diﬀerent quality measures and indicators for VGI. In Section 3, we describe the main

types of VGI that we consider for our survey, and in Section 4, we describe the

methodology that was followed for the selection of literature for this survey. Section 5

summarizes the ﬁndings of the survey, and Section 6 discusses the limitations and future

research perspectives. Finally, we conclude our ﬁndings in Section 7.

2. Measures and indicators for VGI quality

Quality of VGI can be described by quality measures and quality indicators (Antoniou and

Skopeliti 2015). Quality measures, mainly adhering to the ISO principles and guidelines

refer to those elements that can be used to ascertain the discrepancy between the

contributed spatial data and the ground truth (e.g., completeness of data) mainly by

comparing to authoritative data. When authoritative data is no longer usable for

comparisons, and the established measures become no longer adequate to assess the

quality of VGI, researchers have explored more intrinsic ways to assess VGI quality by

looking into other proxies for quality measures. These are called quality indicators, that

rely on various participation biases, contributor expertise or the lack of it, background,

etc., that inﬂuence the quality of VGI, but cannot be directly measured (Antoniou and

Skopeliti 2015). In the following, these quality measures and indicators are described in

detail. The review of quality assessment methods in Section 5 is based on these various

quality measures and indicators.

141

2.1. Quality measures for VGI

International Organization for Standardization (ISO

6

)deﬁned geographic information

quality as totality of characteristics of a product that bear on its ability to satisfy stated

and implied needs. ISO/TC 211

7

(Technical Committee) developed a set of international

standards that deﬁne the measures of geographic information quality (standard 19138,

as part of the metadata standard 19115). These quantitative quality measures are:

completeness, consistency, positional accuracy, temporal accuracy, and thematic

accuracy.

Completeness describes the relationship between the represented objects and their

conceptualizations. This can be measured as the absence of data (errors of omission) and

presence of excess data (errors of commission). Consistency is the coherence in the data

structures of the digitized spatial data. The errors resulting from the lack of it are

indicated by (i) conceptual consistency, (ii) domain consistency, (iii) format consistency,

and (iv) topological consistency. Accuracy refers to the degree of closeness between a

measurement of a quantity and the accepted true value of that quantity, and it is in the

form of positional accuracy, temporal accuracy and thematic accuracy. Positional accu-

racy is indicated by (i) absolute or external accuracy, (ii) relative or internal accuracy, (iii)

gridded data position accuracy. Thematic accuracy is indicated by (i) classiﬁcation

correctness, (ii) non-quantitative attribute correctness, (iii) quantitative attribute accu-

racy. In both cases, the discrepancies can be numerically estimated. Temporal accuracy is

indicated by (i) accuracy of a time measurement: correctness of the temporal references

of an item, (ii) temporal consistency: correctness of ordered events or sequences, (iii)

temporal validity: validity of data with regard to time.

2.2. Quality indicators for VGI

As part of the ISO standards, geographic information quality can be further assessed

through qualitative quality indicators, such as the purpose, usage, and lineage. These

indicators are mainly used to express the quality overview for the data. Purpose

describes the intended usage of the dataset. Usage describes the application(s) in

which the dataset has been utilized. Lineage describes the history of a dataset from

collection, acquisition to compilation and derivation to its form at the time of use (Hoyle

2001, Guinée 2002, Van Oort and Bregt 2005). In addition, where ISO standardized

measures and indicators are not applicable, we have found in the literature more

abstract quality indicators to imply the quality of VGI. These are: trustworthiness, cred-

ibility, text content quality, vagueness, local knowledge, experience, recognition, reputa-

tion. Trustworthiness is a receiver judgment based on subjective characteristics, such as

reliability or trust (good ratings on the creations, and the higher frequency of usage of

these creations indicate this trustworthiness) (Flanagin and Metzger 2008). In assessing

the credibility of VGI, the source of information plays a crucial role, as it is what

credibility is primarily based upon. However, this is not straightforward. Due to the

non-authoritative nature of VGI, the source maybe unavailable, concealed, or missing

(this is avoided by gatekeepers in authoritative data). Credibility was deﬁned by Hovland

et al.(1953) as the believability of a source or message, which comprises primarily two

dimensions, the trustworthiness (as explained earlier), and expertise. Expertise contains

142

objective characteristics such as accuracy, authority, competence, or source credentials

(Flanagin and Metzger 2008). Therefore, in assessing the credibility of data as a quality

indicator one needs to consider factors that attribute to the trustworthiness and exper-

tise. Metadata about the origin of VGI can provide a foundation for the source creden-

tials of VGI (Frew 2007). Text content quality (mostly applicable for text-based VGI)

describes the quality of text data by the use of text features, such as the text length,

structure, style, readability, revision history, topical similarity, the use of technical termi-

nology, etc. Vagueness is the ambiguity with which the data is captured (e.g., vagueness

caused by low resolutions) (De Longueville et al. 2010). Local knowledge is the con-

tributors’ familiarity to the geographic surroundings that she/he is implicitly or explicitly

mapping. Experience is the involvement of a contributor with the VGI platform that she/

he contributes to. This can be expressed by the time that the contributor has been

registered with the VGI portal, number of global positioning system (GPS) tracks con-

tributed (e.g., in OSM) or the number of features added and edited, or the amount of

participation in online forums to discuss the data (Van Exel et al. 2010). Recognition is

the acknowledgement given to a contributor based on tokens achieved (e.g., in gamiﬁed

VGI platforms), and the reviewing of their contributions among their peers (Van Exel

et al. 2010). Maué (2007) described reputation as a tool to ensure the validity of VGI.

Reputation is assessed by, for example, the history of past interactions that are happen-

ing between collaborators. Resnick et al.(2000) described contributors’ abilities and

dispositions as features where this reputation can be based upon. Maué (2007) further

argue that similar to the eBay rating system,

8

the created geographic features on various

VGI platforms can be rated, tagged, discussed, and annotated, which aﬀects the data

contributor’s reputation value.

3. Map, image, and text-based VGI: deﬁnitions and quality issues

The eﬀective utilization of VGI is strongly associated with data quality, and this varies

depending primarily on the type of VGI, the way data is collected on the diﬀerent VGI

platforms, and the context of usage. The following sections describe the selected forms

of VGI: (1) map, (2) image, and (3) text, their uses, and how data quality issues arise.

These three types of VGI are chosen based on the methods that are used to capture the

data (maps: as GPS points and traces, image: as photos, text: as plain text), and because

they are the most popular forms of VGI currently used. This section further lays the

ground work to understand the subsequent section on various quality measures and

indicators, and quality assessment methods used for these three types of VGI.

3.1. Map-based VGI

Map-based VGI concerns all VGI sources that include geometries as points, lines, and

polygons, the basic elements to design a map. Among others, OSM, Wikimapia, Google

Map Maker, and Map Insight are examples of map-based VGI projects. However, OSM is

the most prominent project due to the following reasons: (i) it aims to develop a free

map of the world accessible and obtainable for everyone; (ii) it has millions of registered

contributors; (iii) it has active mapper communities in many locations; and (iv) it provides

free and ﬂexible contribution mechanisms for data (useful for map provision, routing,

143

A review of volunteered geographic information quality assessment methods

Citations

Crowdsource mapping of target buildings in hazard: the utilization of smartphone technologies and geographic services

Optimising Citizen-Driven Air Quality Monitoring Networks for Cities

Exploring the distribution patterns of Flickr photos

Mining graphs from travel blogs: a review in the context of tour planning

M:N Object matching on multiscale datasets based on MBR combinatorial optimization algorithm and spatial district

References

Earthquake shakes Twitter users: real-time event detection by social sensors

Citizens as sensors: the world of volunteered geography

Reputation systems

Handbook on Life Cycle Assessment: Operational Guide to the ISO Standards

Information credibility on twitter

Related Papers (5)

Citizens as sensors: the world of volunteered geography

How Good is Volunteered Geographical Information? A Comparative Study of OpenStreetMap and Ordnance Survey Datasets:

Assuring the quality of volunteered geographic information

Quality Assessment of the French OpenStreetMap Dataset

The credibility of volunteered geographic information