A review of volunteered geographic information quality
assessment methods
Hansi Senaratne
a
, Amin Mobasheri
b
, Ahmed Loai Ali
c,d
, Cristina Capineri
e
and Mordechai (Muki) Haklay
f
a
Data Analysis and Visualization Group, University of Konstanz, Konstanz, Germany;
b
GIScience Research
Group, Heidelberg University, Heidelberg, Germany;
c
Bremen Spatial Cognition Center, University of
Bremen, Bremen, Germany;
d
Information System Department, Assiut University, Assiut, Egypt;
e
Faculty of
Political Sciences, University of Sienna, Sienna, Italy;
f
Department of Geomatic Engineering, University
College London, London, UK
ABSTRACT
With the ubiquity of advanced web technologies and location-
sensing hand held devices, citizens regardless of their knowledge
or expertise, are able to produce spatial information. This phenom-
enon is known as volunteered geographic information (VGI).
During the past decade VGI has been used as a data source
supporting a wide range of services, such as environmental mon-
itoring, events reporting, human movement analysis, disaster
management, etc. However, these volunteer-contributed data
also come with varying quality. Reasons for this are: data is pro-
duced by heterogeneous contributors, using various technologies
and tools, having different level of details and precision, serving
heterogeneous purposes, and a lack of gatekeepers. Crowd-sour-
cing, social, and geographic approaches have been proposed and
later followed to develop appropriate methods to assess the
quality measures and indicators of VGI. In this article, we review
various quality measures and indicators for selected types of VGI
and existing quality assessment methods. As an outcome, the
article presents a classification of VGI with current methods uti-
lized to assess the quality of selected types of VGI. Through these
findings, we introduce data mining as an additional approach for
quality handling in VGI.
1. Introduction
Volunteered geographic information (VGI) is where citizens, often untrained, and regard-
less of their expertise and background create geographic information on dedicated web
platforms (Goodchild 2007), e.g., OpenStreetMap (OSM),
1
Wikimapia,
2
Google MyMaps,
3
Map Insight
4
and Flickr.
5
In a typology of VGI, the works of Antoniou et al.(2010) and
Craglia et al.(2012) classified VGI based on the type of explicit/implicit geography being
captured and the type of explicit/implicit volunteering. In explicit-VGI, contributors are
mainly focused on mapping activities. Thus, the contributor explicitly annotates the data
with geographic contents (e.g., geometries in OSM, Wikimapia, or Google). Data that is
Konstanzer Online-Publikations-System (KOPS)
URL: http://nbn-resolving.de/urn:nbn:de:bsz:352-0-341149
Erschienen in: International Journal of Geographical Information Science ; 31 (2017), 1. - S. 139-167
https://dx.doi.org/10.1080/13658816.2016.1189556
implicitly associated with a geographic location could be any kind of media: text, image,
or video referring to or associated with a specific geographic location. For example,
geotagged microblogs (e.g., Tweets), geotagged images from Flicker, or Wikipedia
articles that refer to geographic locations. Craglia et al.(2012) further elaborated that
for each type of implicit/explicit geography and volunteering, there are potentially
different approaches for assessing the quality.
Due to the increased potential and use of VGI (as demonstrated in the works of Liu
et al. 2008, Jacob et al. 2009, McDougall 2009, Bulearca and Bulearca 2010, Sakaki et al.
2010, MacEachren et al. 2011, Chunara et al. 2012, Fuchs et al. 2013), it becomes
increasingly important to be aware of the quality of VGI, in order to derive accurate
information and decisions. Due to a lack of standardization, quality in VGI has shown to
vary across heterogeneous data sources (text, image, maps, etc.). For example, as seen in
Figure 1, a photograph of the famous tourist site the Brandenburg Gate in Berlin is
incorrectly geotagged in Jakarta, Indonesia on the photo-sharing platform Flickr. On the
other hand, OSM has also shown heterogeneity in coverage between different places
(Haklay 2010). These trigger a variable quality in VGI. This can be explained by the fact
that humans perceive and express geographic regions and spatial relations imprecisely,
and in terms of vague concepts (Montello et al. 2003). This vagueness in human
conceptualization of location is due not only to the fact that geographic entities are
continuous in nature, but also due to the quality and limitations of spatial knowledge
(Hollenstein and Purves 2014).
Figure 1. A photograph of the Brandenburg Gate in Berlin is incorrectly geotagged in Jakarta,
Indonesia on the popular photo-sharing platform Flickr.
140
Providing reliable services or extraction of useful information require data with a
fitness-for-use quality standard. Incorrect (as seen in Figure 1) or malicious geographic
annotations could be minimized in place of appropriate quality indicators and measures
for these various VGI contributions.
Goodchild and Li (2012) have discussed three approaches for assuring the quality of
VGI: crowd-sourcing (the involvement of a group to validate and correct errors that have
been made by an individual contributor), social approaches (trusted individuals who
have made themselves a good reputation with their contributions to VGI can, for
example, act as gatekeepers to maintain and control the quality of other VGI contribu-
tions), and geographic approaches (use of laws and knowledge from geography, such as
Tobler’s first law to assess the quality). Many works have developed methods to assess
the quality of VGI based on these approaches.
In this article, we present an extensive review of the existing methods in the state-of-
the-art to assess the quality of map-, image-, and text-based VGI. As an outcome of the
review, we identify data mining as one more stand-alone approach to assess VGI quality
by utilizing computational processes for discovering patterns and learning purely from
data, irrespective of the laws and knowledge from geography, and independent from
social or crowd-sourced approaches. Extending the spectrum of approaches will sprout
more quality assessment methods in the future, especially for VGI types that have not
been extensively researched so far. To the best of our knowledge, surveys on existing
methods have not been done so far. This review provides an overview of methods that
have been built based on theories and discussions in the literature. Furthermore, this
survey gives the reader a glimpse to the practical applicability of all identified
approaches. The remainder of this article unfolds as follows. In Section 2, we describe
the different quality measures and indicators for VGI. In Section 3, we describe the main
types of VGI that we consider for our survey, and in Section 4, we describe the
methodology that was followed for the selection of literature for this survey. Section 5
summarizes the findings of the survey, and Section 6 discusses the limitations and future
research perspectives. Finally, we conclude our findings in Section 7.
2. Measures and indicators for VGI quality
Quality of VGI can be described by quality measures and quality indicators (Antoniou and
Skopeliti 2015). Quality measures, mainly adhering to the ISO principles and guidelines
refer to those elements that can be used to ascertain the discrepancy between the
contributed spatial data and the ground truth (e.g., completeness of data) mainly by
comparing to authoritative data. When authoritative data is no longer usable for
comparisons, and the established measures become no longer adequate to assess the
quality of VGI, researchers have explored more intrinsic ways to assess VGI quality by
looking into other proxies for quality measures. These are called quality indicators, that
rely on various participation biases, contributor expertise or the lack of it, background,
etc., that influence the quality of VGI, but cannot be directly measured (Antoniou and
Skopeliti 2015). In the following, these quality measures and indicators are described in
detail. The review of quality assessment methods in Section 5 is based on these various
quality measures and indicators.
141
2.1. Quality measures for VGI
International Organization for Standardization (ISO
6
)defined geographic information
quality as totality of characteristics of a product that bear on its ability to satisfy stated
and implied needs. ISO/TC 211
7
(Technical Committee) developed a set of international
standards that define the measures of geographic information quality (standard 19138,
as part of the metadata standard 19115). These quantitative quality measures are:
completeness, consistency, positional accuracy, temporal accuracy, and thematic
accuracy.
Completeness describes the relationship between the represented objects and their
conceptualizations. This can be measured as the absence of data (errors of omission) and
presence of excess data (errors of commission). Consistency is the coherence in the data
structures of the digitized spatial data. The errors resulting from the lack of it are
indicated by (i) conceptual consistency, (ii) domain consistency, (iii) format consistency,
and (iv) topological consistency. Accuracy refers to the degree of closeness between a
measurement of a quantity and the accepted true value of that quantity, and it is in the
form of positional accuracy, temporal accuracy and thematic accuracy. Positional accu-
racy is indicated by (i) absolute or external accuracy, (ii) relative or internal accuracy, (iii)
gridded data position accuracy. Thematic accuracy is indicated by (i) classification
correctness, (ii) non-quantitative attribute correctness, (iii) quantitative attribute accu-
racy. In both cases, the discrepancies can be numerically estimated. Temporal accuracy is
indicated by (i) accuracy of a time measurement: correctness of the temporal references
of an item, (ii) temporal consistency: correctness of ordered events or sequences, (iii)
temporal validity: validity of data with regard to time.
2.2. Quality indicators for VGI
As part of the ISO standards, geographic information quality can be further assessed
through qualitative quality indicators, such as the purpose, usage, and lineage. These
indicators are mainly used to express the quality overview for the data. Purpose
describes the intended usage of the dataset. Usage describes the application(s) in
which the dataset has been utilized. Lineage describes the history of a dataset from
collection, acquisition to compilation and derivation to its form at the time of use (Hoyle
2001, Guinée 2002, Van Oort and Bregt 2005). In addition, where ISO standardized
measures and indicators are not applicable, we have found in the literature more
abstract quality indicators to imply the quality of VGI. These are: trustworthiness, cred-
ibility, text content quality, vagueness, local knowledge, experience, recognition, reputa-
tion. Trustworthiness is a receiver judgment based on subjective characteristics, such as
reliability or trust (good ratings on the creations, and the higher frequency of usage of
these creations indicate this trustworthiness) (Flanagin and Metzger 2008). In assessing
the credibility of VGI, the source of information plays a crucial role, as it is what
credibility is primarily based upon. However, this is not straightforward. Due to the
non-authoritative nature of VGI, the source maybe unavailable, concealed, or missing
(this is avoided by gatekeepers in authoritative data). Credibility was defined by Hovland
et al.(1953) as the believability of a source or message, which comprises primarily two
dimensions, the trustworthiness (as explained earlier), and expertise. Expertise contains
142
objective characteristics such as accuracy, authority, competence, or source credentials
(Flanagin and Metzger 2008). Therefore, in assessing the credibility of data as a quality
indicator one needs to consider factors that attribute to the trustworthiness and exper-
tise. Metadata about the origin of VGI can provide a foundation for the source creden-
tials of VGI (Frew 2007). Text content quality (mostly applicable for text-based VGI)
describes the quality of text data by the use of text features, such as the text length,
structure, style, readability, revision history, topical similarity, the use of technical termi-
nology, etc. Vagueness is the ambiguity with which the data is captured (e.g., vagueness
caused by low resolutions) (De Longueville et al. 2010). Local knowledge is the con-
tributors’ familiarity to the geographic surroundings that she/he is implicitly or explicitly
mapping. Experience is the involvement of a contributor with the VGI platform that she/
he contributes to. This can be expressed by the time that the contributor has been
registered with the VGI portal, number of global positioning system (GPS) tracks con-
tributed (e.g., in OSM) or the number of features added and edited, or the amount of
participation in online forums to discuss the data (Van Exel et al. 2010). Recognition is
the acknowledgement given to a contributor based on tokens achieved (e.g., in gamified
VGI platforms), and the reviewing of their contributions among their peers (Van Exel
et al. 2010). Maué (2007) described reputation as a tool to ensure the validity of VGI.
Reputation is assessed by, for example, the history of past interactions that are happen-
ing between collaborators. Resnick et al.(2000) described contributors’ abilities and
dispositions as features where this reputation can be based upon. Maué (2007) further
argue that similar to the eBay rating system,
8
the created geographic features on various
VGI platforms can be rated, tagged, discussed, and annotated, which affects the data
contributor’s reputation value.
3. Map, image, and text-based VGI: definitions and quality issues
The effective utilization of VGI is strongly associated with data quality, and this varies
depending primarily on the type of VGI, the way data is collected on the different VGI
platforms, and the context of usage. The following sections describe the selected forms
of VGI: (1) map, (2) image, and (3) text, their uses, and how data quality issues arise.
These three types of VGI are chosen based on the methods that are used to capture the
data (maps: as GPS points and traces, image: as photos, text: as plain text), and because
they are the most popular forms of VGI currently used. This section further lays the
ground work to understand the subsequent section on various quality measures and
indicators, and quality assessment methods used for these three types of VGI.
3.1. Map-based VGI
Map-based VGI concerns all VGI sources that include geometries as points, lines, and
polygons, the basic elements to design a map. Among others, OSM, Wikimapia, Google
Map Maker, and Map Insight are examples of map-based VGI projects. However, OSM is
the most prominent project due to the following reasons: (i) it aims to develop a free
map of the world accessible and obtainable for everyone; (ii) it has millions of registered
contributors; (iii) it has active mapper communities in many locations; and (iv) it provides
free and flexible contribution mechanisms for data (useful for map provision, routing,
143