scispace - formally typeset
Open AccessJournal ArticleDOI

A review of volunteered geographic information quality assessment methods

TLDR
Data mining is introduced as an additional approach for quality handling in VGI by reviewing various quality measures and indicators for selected types of VGI and existing quality assessment methods.
Abstract
With the ubiquity of advanced web technologies and location-sensing hand held devices, citizens regardless of their knowledge or expertise, are able to produce spatial information. This phenomenon is known as volunteered geographic information VGI. During the past decade VGI has been used as a data source supporting a wide range of services, such as environmental monitoring, events reporting, human movement analysis, disaster management, etc. However, these volunteer-contributed data also come with varying quality. Reasons for this are: data is produced by heterogeneous contributors, using various technologies and tools, having different level of details and precision, serving heterogeneous purposes, and a lack of gatekeepers. Crowd-sourcing, social, and geographic approaches have been proposed and later followed to develop appropriate methods to assess the quality measures and indicators of VGI. In this article, we review various quality measures and indicators for selected types of VGI and existing quality assessment methods. As an outcome, the article presents a classification of VGI with current methods utilized to assess the quality of selected types of VGI. Through these findings, we introduce data mining as an additional approach for quality handling in VGI.

read more

Content maybe subject to copyright    Report

A review of volunteered geographic information quality
assessment methods
Hansi Senaratne
a
, Amin Mobasheri
b
, Ahmed Loai Ali
c,d
, Cristina Capineri
e
and Mordechai (Muki) Haklay
f
a
Data Analysis and Visualization Group, University of Konstanz, Konstanz, Germany;
b
GIScience Research
Group, Heidelberg University, Heidelberg, Germany;
c
Bremen Spatial Cognition Center, University of
Bremen, Bremen, Germany;
d
Information System Department, Assiut University, Assiut, Egypt;
e
Faculty of
Political Sciences, University of Sienna, Sienna, Italy;
f
Department of Geomatic Engineering, University
College London, London, UK
ABSTRACT
With the ubiquity of advanced web technologies and location-
sensing hand held devices, citizens regardless of their knowledge
or expertise, are able to produce spatial information. This phenom-
enon is known as volunteered geographic information (VGI).
During the past decade VGI has been used as a data source
supporting a wide range of services, such as environmental mon-
itoring, events reporting, human movement analysis, disaster
management, etc. However, these volunteer-contributed data
also come with varying quality. Reasons for this are: data is pro-
duced by heterogeneous contributors, using various technologies
and tools, having dierent level of details and precision, serving
heterogeneous purposes, and a lack of gatekeepers. Crowd-sour-
cing, social, and geographic approaches have been proposed and
later followed to develop appropriate methods to assess the
quality measures and indicators of VGI. In this article, we review
various quality measures and indicators for selected types of VGI
and existing quality assessment methods. As an outcome, the
article presents a classication of VGI with current methods uti-
lized to assess the quality of selected types of VGI. Through these
ndings, we introduce data mining as an additional approach for
quality handling in VGI.
1. Introduction
Volunteered geographic information (VGI) is where citizens, often untrained, and regard-
less of their expertise and background create geographic information on dedicated web
platforms (Goodchild 2007), e.g., OpenStreetMap (OSM),
1
Wikimapia,
2
Google MyMaps,
3
Map Insight
4
and Flickr.
5
In a typology of VGI, the works of Antoniou et al.(2010) and
Craglia et al.(2012) classied VGI based on the type of explicit/implicit geography being
captured and the type of explicit/implicit volunteering. In explicit-VGI, contributors are
mainly focused on mapping activities. Thus, the contributor explicitly annotates the data
with geographic contents (e.g., geometries in OSM, Wikimapia, or Google). Data that is
Konstanzer Online-Publikations-System (KOPS)
URL: http://nbn-resolving.de/urn:nbn:de:bsz:352-0-341149
Erschienen in: International Journal of Geographical Information Science ; 31 (2017), 1. - S. 139-167
https://dx.doi.org/10.1080/13658816.2016.1189556

implicitly associated with a geographic location could be any kind of media: text, image,
or video referring to or associated with a specic geographic location. For example,
geotagged microblogs (e.g., Tweets), geotagged images from Flicker, or Wikipedia
articles that refer to geographic locations. Craglia et al.(2012) further elaborated that
for each type of implicit/explicit geography and volunteering, there are potentially
dierent approaches for assessing the quality.
Due to the increased potential and use of VGI (as demonstrated in the works of Liu
et al. 2008, Jacob et al. 2009, McDougall 2009, Bulearca and Bulearca 2010, Sakaki et al.
2010, MacEachren et al. 2011, Chunara et al. 2012, Fuchs et al. 2013), it becomes
increasingly important to be aware of the quality of VGI, in order to derive accurate
information and decisions. Due to a lack of standardization, quality in VGI has shown to
vary across heterogeneous data sources (text, image, maps, etc.). For example, as seen in
Figure 1, a photograph of the famous tourist site the Brandenburg Gate in Berlin is
incorrectly geotagged in Jakarta, Indonesia on the photo-sharing platform Flickr. On the
other hand, OSM has also shown heterogeneity in coverage between dierent places
(Haklay 2010). These trigger a variable quality in VGI. This can be explained by the fact
that humans perceive and express geographic regions and spatial relations imprecisely,
and in terms of vague concepts (Montello et al. 2003). This vagueness in human
conceptualization of location is due not only to the fact that geographic entities are
continuous in nature, but also due to the quality and limitations of spatial knowledge
(Hollenstein and Purves 2014).
Figure 1. A photograph of the Brandenburg Gate in Berlin is incorrectly geotagged in Jakarta,
Indonesia on the popular photo-sharing platform Flickr.
140

Providing reliable services or extraction of useful information require data with a
tness-for-use quality standard. Incorrect (as seen in Figure 1) or malicious geographic
annotations could be minimized in place of appropriate quality indicators and measures
for these various VGI contributions.
Goodchild and Li (2012) have discussed three approaches for assuring the quality of
VGI: crowd-sourcing (the involvement of a group to validate and correct errors that have
been made by an individual contributor), social approaches (trusted individuals who
have made themselves a good reputation with their contributions to VGI can, for
example, act as gatekeepers to maintain and control the quality of other VGI contribu-
tions), and geographic approaches (use of laws and knowledge from geography, such as
Toblers rst law to assess the quality). Many works have developed methods to assess
the quality of VGI based on these approaches.
In this article, we present an extensive review of the existing methods in the state-of-
the-art to assess the quality of map-, image-, and text-based VGI. As an outcome of the
review, we identify data mining as one more stand-alone approach to assess VGI quality
by utilizing computational processes for discovering patterns and learning purely from
data, irrespective of the laws and knowledge from geography, and independent from
social or crowd-sourced approaches. Extending the spectrum of approaches will sprout
more quality assessment methods in the future, especially for VGI types that have not
been extensively researched so far. To the best of our knowledge, surveys on existing
methods have not been done so far. This review provides an overview of methods that
have been built based on theories and discussions in the literature. Furthermore, this
survey gives the reader a glimpse to the practical applicability of all identied
approaches. The remainder of this article unfolds as follows. In Section 2, we describe
the dierent quality measures and indicators for VGI. In Section 3, we describe the main
types of VGI that we consider for our survey, and in Section 4, we describe the
methodology that was followed for the selection of literature for this survey. Section 5
summarizes the ndings of the survey, and Section 6 discusses the limitations and future
research perspectives. Finally, we conclude our ndings in Section 7.
2. Measures and indicators for VGI quality
Quality of VGI can be described by quality measures and quality indicators (Antoniou and
Skopeliti 2015). Quality measures, mainly adhering to the ISO principles and guidelines
refer to those elements that can be used to ascertain the discrepancy between the
contributed spatial data and the ground truth (e.g., completeness of data) mainly by
comparing to authoritative data. When authoritative data is no longer usable for
comparisons, and the established measures become no longer adequate to assess the
quality of VGI, researchers have explored more intrinsic ways to assess VGI quality by
looking into other proxies for quality measures. These are called quality indicators, that
rely on various participation biases, contributor expertise or the lack of it, background,
etc., that inuence the quality of VGI, but cannot be directly measured (Antoniou and
Skopeliti 2015). In the following, these quality measures and indicators are described in
detail. The review of quality assessment methods in Section 5 is based on these various
quality measures and indicators.
141

2.1. Quality measures for VGI
International Organization for Standardization (ISO
6
)dened geographic information
quality as totality of characteristics of a product that bear on its ability to satisfy stated
and implied needs. ISO/TC 211
7
(Technical Committee) developed a set of international
standards that dene the measures of geographic information quality (standard 19138,
as part of the metadata standard 19115). These quantitative quality measures are:
completeness, consistency, positional accuracy, temporal accuracy, and thematic
accuracy.
Completeness describes the relationship between the represented objects and their
conceptualizations. This can be measured as the absence of data (errors of omission) and
presence of excess data (errors of commission). Consistency is the coherence in the data
structures of the digitized spatial data. The errors resulting from the lack of it are
indicated by (i) conceptual consistency, (ii) domain consistency, (iii) format consistency,
and (iv) topological consistency. Accuracy refers to the degree of closeness between a
measurement of a quantity and the accepted true value of that quantity, and it is in the
form of positional accuracy, temporal accuracy and thematic accuracy. Positional accu-
racy is indicated by (i) absolute or external accuracy, (ii) relative or internal accuracy, (iii)
gridded data position accuracy. Thematic accuracy is indicated by (i) classication
correctness, (ii) non-quantitative attribute correctness, (iii) quantitative attribute accu-
racy. In both cases, the discrepancies can be numerically estimated. Temporal accuracy is
indicated by (i) accuracy of a time measurement: correctness of the temporal references
of an item, (ii) temporal consistency: correctness of ordered events or sequences, (iii)
temporal validity: validity of data with regard to time.
2.2. Quality indicators for VGI
As part of the ISO standards, geographic information quality can be further assessed
through qualitative quality indicators, such as the purpose, usage, and lineage. These
indicators are mainly used to express the quality overview for the data. Purpose
describes the intended usage of the dataset. Usage describes the application(s) in
which the dataset has been utilized. Lineage describes the history of a dataset from
collection, acquisition to compilation and derivation to its form at the time of use (Hoyle
2001, Guinée 2002, Van Oort and Bregt 2005). In addition, where ISO standardized
measures and indicators are not applicable, we have found in the literature more
abstract quality indicators to imply the quality of VGI. These are: trustworthiness, cred-
ibility, text content quality, vagueness, local knowledge, experience, recognition, reputa-
tion. Trustworthiness is a receiver judgment based on subjective characteristics, such as
reliability or trust (good ratings on the creations, and the higher frequency of usage of
these creations indicate this trustworthiness) (Flanagin and Metzger 2008). In assessing
the credibility of VGI, the source of information plays a crucial role, as it is what
credibility is primarily based upon. However, this is not straightforward. Due to the
non-authoritative nature of VGI, the source maybe unavailable, concealed, or missing
(this is avoided by gatekeepers in authoritative data). Credibility was dened by Hovland
et al.(1953) as the believability of a source or message, which comprises primarily two
dimensions, the trustworthiness (as explained earlier), and expertise. Expertise contains
142

objective characteristics such as accuracy, authority, competence, or source credentials
(Flanagin and Metzger 2008). Therefore, in assessing the credibility of data as a quality
indicator one needs to consider factors that attribute to the trustworthiness and exper-
tise. Metadata about the origin of VGI can provide a foundation for the source creden-
tials of VGI (Frew 2007). Text content quality (mostly applicable for text-based VGI)
describes the quality of text data by the use of text features, such as the text length,
structure, style, readability, revision history, topical similarity, the use of technical termi-
nology, etc. Vagueness is the ambiguity with which the data is captured (e.g., vagueness
caused by low resolutions) (De Longueville et al. 2010). Local knowledge is the con-
tributors familiarity to the geographic surroundings that she/he is implicitly or explicitly
mapping. Experience is the involvement of a contributor with the VGI platform that she/
he contributes to. This can be expressed by the time that the contributor has been
registered with the VGI portal, number of global positioning system (GPS) tracks con-
tributed (e.g., in OSM) or the number of features added and edited, or the amount of
participation in online forums to discuss the data (Van Exel et al. 2010). Recognition is
the acknowledgement given to a contributor based on tokens achieved (e.g., in gamied
VGI platforms), and the reviewing of their contributions among their peers (Van Exel
et al. 2010). Maué (2007) described reputation as a tool to ensure the validity of VGI.
Reputation is assessed by, for example, the history of past interactions that are happen-
ing between collaborators. Resnick et al.(2000) described contributors abilities and
dispositions as features where this reputation can be based upon. Maué (2007) further
argue that similar to the eBay rating system,
8
the created geographic features on various
VGI platforms can be rated, tagged, discussed, and annotated, which aects the data
contributors reputation value.
3. Map, image, and text-based VGI: denitions and quality issues
The eective utilization of VGI is strongly associated with data quality, and this varies
depending primarily on the type of VGI, the way data is collected on the dierent VGI
platforms, and the context of usage. The following sections describe the selected forms
of VGI: (1) map, (2) image, and (3) text, their uses, and how data quality issues arise.
These three types of VGI are chosen based on the methods that are used to capture the
data (maps: as GPS points and traces, image: as photos, text: as plain text), and because
they are the most popular forms of VGI currently used. This section further lays the
ground work to understand the subsequent section on various quality measures and
indicators, and quality assessment methods used for these three types of VGI.
3.1. Map-based VGI
Map-based VGI concerns all VGI sources that include geometries as points, lines, and
polygons, the basic elements to design a map. Among others, OSM, Wikimapia, Google
Map Maker, and Map Insight are examples of map-based VGI projects. However, OSM is
the most prominent project due to the following reasons: (i) it aims to develop a free
map of the world accessible and obtainable for everyone; (ii) it has millions of registered
contributors; (iii) it has active mapper communities in many locations; and (iv) it provides
free and exible contribution mechanisms for data (useful for map provision, routing,
143

Citations
More filters
Journal ArticleDOI

Crowdsource mapping of target buildings in hazard: the utilization of smartphone technologies and geographic services

TL;DR: The accuracy assessment showed that the trigonometric method by the means of embedded sensors would yield the best result, but geocoding is more economical in terms of time than other methods.
Journal ArticleDOI

Optimising Citizen-Driven Air Quality Monitoring Networks for Cities

TL;DR: The ideas presented in this article are useful for the systematic deployment of VGI air quality sensors, and can aid in the creation of higher resolution, more realistic maps for air quality monitoring in cities.
Journal ArticleDOI

Exploring the distribution patterns of Flickr photos

TL;DR: The distribution pattern for most relevant VGI images of specific landmarks is explored to extend the current quality analysis, and to provide guidance for improving the data-retrieval process of geographic applications.
Journal ArticleDOI

Mining graphs from travel blogs: a review in the context of tour planning

TL;DR: An analysis of travel blog mining and its dominant research themes, based on which three distinct categories of graphs are proposed are proposed, with the underlying idea of enhancing travel planning process with better POI graphs.
Journal ArticleDOI

M:N Object matching on multiscale datasets based on MBR combinatorial optimization algorithm and spatial district

TL;DR: This article presents a multiscale polygonal object‐matching approach, called the minimum bounding rectangle combinatorial optimization (MBRCO) with spatial district (SD), which outperforms the common two‐way area overlap method and another method based on the contextual information and relaxation labeling algorithm.
References
More filters
Proceedings ArticleDOI

Earthquake shakes Twitter users: real-time event detection by social sensors

TL;DR: This paper investigates the real-time interaction of events such as earthquakes in Twitter and proposes an algorithm to monitor tweets and to detect a target event and produces a probabilistic spatiotemporal model for the target event that can find the center and the trajectory of the event location.
Journal ArticleDOI

Citizens as sensors: the world of volunteered geography

TL;DR: In recent months, there has been an explosion of interest in using the Web to create, assemble, and disseminate geographic information provided voluntarily by individuals as mentioned in this paper, and the role of the amateur in geographic observation has been discussed.
Journal ArticleDOI

Reputation systems

TL;DR: Systems T he Internet offers vast new opportunities to interact with total strangers, but these interactions can be fun, informative, even profitable, but they also involve risk.
Book

Handbook on Life Cycle Assessment: Operational Guide to the ISO Standards

TL;DR: The Guide to LCA is a guide to the management of LCA projects: procedures and guiding principles for the present Guide, which aims to clarify goal and scope definition, impact assessment, and interpretation.
Proceedings ArticleDOI

Information credibility on twitter

TL;DR: There are measurable differences in the way messages propagate, that can be used to classify them automatically as credible or not credible, with precision and recall in the range of 70% to 80%.
Related Papers (5)