scispace - formally typeset
Open AccessJournal ArticleDOI

Linked Open Government Data: Lessons from Data.gov.uk

TLDR
A project to extract value from open government data contributes to the population of the linked data Web with high-value data of good provenance.
Abstract
A project to extract value from open government data contributes to the population of the linked data Web with high-value data of good provenance.

read more

Content maybe subject to copyright    Report

2 1541-1672/12/$31.00 © 2012 IEEE IEEE INTELLIGENT SYSTEMS
Published by the IEEE Computer Society
L I N K E D O P E N G O V E R N M E N T D A T A
Linked Open
Government Data:
Lessons from
Data.gov.uk
Nigel Shadbolt, Kieron O’Hara, Tim Berners-Lee, Nicholas Gibbins, Hugh Glaser,
Wendy Hall and m.c. schraefel, University of Southampton
A project to extract
value from open
government data
contributes to the
population of the
linked data Web
with high-value data
of good provenance.
to citizens’ needs. Even when governments
have exposed service provision to market
disciplines, they haven’t succeeded in pre-
senting data to citizens in innovative ways
to create new value streams. Privatized ser-
vice providers have preserved monopolies of
service design and provision.
As technology has increased the power
of data by facilitating linking and sharing,
and political thinkers have embraced trans-
parency and citizens’ right to data, this top-
down culture is being challenged. Many
governments now release large quantities
of data into the public domain, often free
of charge and without administrative over-
head. This allows citizen-centered service
delivery and design and improves account-
ability of public services, leading to better
public-service outcomes.
In the United Kingdom, transparency
is focused on Data.gov.uk, the public data
catalogue that points to thousands of data-
sets downloadable under a permissive open
government license. The datasets are often
in comma-separated value (CSV) format or
spreadsheets, but there is potential for in-
creasing their utility by linking them using
structured machine-processable formats.
Resource Description Framework (RDF)
is the format most integrated into current
thinking about future generations of the
Web, as its use of URIs allows data to be
identied by reference and linked with other
relevant data by subject, predicate, or ob-
ject. The use of Semantic Web standards
in open government data (OGD) was pio-
neered by Advanced Knowledge Technolo-
gies (AKT) in a precursor to the work de-
scribed here, and was reported to the UK
Parliament in 2007.
1
We refer to this vision as the linked-data
Web (LDW). The LDW is already well pop-
ulated through initiatives such as DBpedia,
the DBLP Computer Science Bibliogra-
phy, the London Gazette, the New York
Times, and the Comprehensive Knowledge
Archive Network (CKAN). The formalisms
and infrastructure are appearing according
S
ervices require data. In a top-down political culture where the state is
the service provider of rst resort, the state becomes a powerful data
monopoly, able to structure and homogenize the interactions between itself
and its citizens. Such one-sided interactions are expensive and unresponsive
IS-27-03-Ohar.indd 2 5/18/12 4:32 PM

MAY/JUNE 2012 www.computer.org/intelligent 3
to linked data principles set out by
Tim Berners-Lee some time ago (www.
w3.org /designissues/ linkeddata.
html), but vital research issues still
need to be addressed.
First, we need to understand how
to build or reuse ontologies easily and
appropriately for particular applica-
tions. Second, we want query meth-
ods that scale across the unbounded
Web, not just within small islands of
well-structured data. Third, we need
visualization and browsing tools, and
fourth, we need to populate the LDW
to increase the network effects of large-
scale linking. These objectives drive
the fundamental research of the
EnAKTing project (www.enakting.
org), funded by the UK’s Engineer-
ing and Physical Sciences Research
Council (EPSRC).
OGD will make an important con-
tribution to the LDW. Its quantity
will help deliver the network effects
expected from the LDW, its prov-
enance is clearer than that of many
other types of data, and it is often
seen as high quality, trustworthy, and
neutral.
Representing OGD in RDF and
linking to other datasets presents im-
portant research challenges, including
• discovering appropriate datasets for
applications,
• integrating OGD into the LDW,
• understanding the best join points
for diverse datasetsthat is, the
points of reference the databases
share, which are extremely valu-
able for linkingand
• building client applications to con-
sume the data, including interfac-
ing with real-world users.
In this article, we use the EnAKTing
approach to develop an integrated
account of how to bring OGD into
the LDW. EnAKTing’s focus is the
LDW as a whole, but here we focus
on the population of the LDW with
OGD from Data.gov.uk, looking in
turn at these four issues.
Discovering and
Migrating Data
The adoption of OGD for use in the
LDW will depend on its availability,
and a necessary first step into ex-
panding the LDW with OGD is the
data discovery process. There have
been a number of services supporting
the location of public sector informa-
tion (PSI), including Data.gov in the
US and Data.gov.uk in the UK. Tools
research and development is per-
mitting the translation of PSI data-
sets into RDF and the generation of
RDFa (RDF with attributes) catalogs,
while the UK government is expos-
ing linked-data endpoints of available
PSI for reuse.
2
However, innovative uses of PSI
transcend borders; meteorology or
transport applications, for example,
need data from more than one na-
tion. The LDW will be an important
mechanism for data convergence, as
shown by the Open Knowledge Foun-
dations CKANa registry of open
data available for public use with a
common cataloging schema built on
a few metadata termsand by the
European statistical service Eurostat,
which has amalgamated thousands
of datasets with their metadata for
download from its website. However,
there is no single facility for retriev-
ing related resources from the portals
of the various nations, or for search-
ing intelligently across regional, na-
tional, and supranational sources.
EnAKTing has proposed an archi-
tecture—not yet fully implemented—
for integrating PSI catalogs via the ac-
tivities and components essential for
discovery. Architectures of this type
allow the presentation of catalogs in a
standardized form, facilitating search
and retrieval across resources.
The rst phase of this architecture
involves downloading and transform-
ing catalogs with retrievable records
into a common schema language for-
mat, whereas the second addresses
semantic heterogeneity with schema
matching and statistical analyses of
ontology structures. Once common
ontologies are in place, the search en-
gine layer can be developed, allowing
distributed querying and federated
search and retrieval.
Initial work has tested this archi-
tecture, using approximately 7,000
records taken from Data.gov.uk, the
US site Data.gov, and the Australian
national PSI catalog. Records were
converted from native format into
RDF, each detailing some 14 to 25
metadata elds, and stored in an RDF
triple-store.
The initial translation was inten-
tionally minimal, reecting the cata-
logs’ original contents and preserving
the underlying arrangement of data.
This reveals the need for data nor-
malization. For example, temporal
data such as release or modication
dates were not always represented
with a universal standard. With thou-
sands of ambiguous dates, classiers
need to be applied to the data before
evaluation and comparison of the re-
sources referred to in the catalogs are
possible.
Integrating OGD into
the Web of Linked Data
Once datasets have been discovered,
they must still be integrated into the
linked data cloud. An application we
developed for EnAKTing provides an
example of such integration, bring-
ing together six government datasets,
covering the work of individual MPs
(members of Parliament) and of Par-
liament as a whole, crime, mortality
and health statistics, and geographi-
cal data from the Ordnance Survey.
This application lets users investigate
IS-27-03-Ohar.indd 3 5/18/12 4:32 PM

4 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS
L I N K E D O P E N G O V E R N M E N T D A T A
a particular geographical region.
2
Only the Ordnance Survey material
was in RDF.
Publication and Consumption
of the Datasets
Using well-known ontologies such
as Dublin Core, Friend of a Friend
(FOAF), and the Statistical Core Vo-
cabulary (Scovo) eased the modeling
overhead. Scripts were written to con-
vert data from spreadsheets into RDF,
and the Jena Semantic Web Frame-
work was used to convert the HTML
and XML , making data linkable
without determining the semantics.
For instance, data commonly con-
tains terms that make perfect sense to
experts in the eld but are opaque to
the rest of us; the health datasets used
in our application included the codes
SHA Code and Org Code, which can
only be understood by someone au
fait with UK National Health Service
(NHS) administration. Such prob-
lems multiply across datasets, requir-
ing an ontological alignment stage.
In our application, this involved the
correct identication of
owl:sameAs
relations across a dimension linking
the datasets. Administrative geogra-
phy provided the link, via MPs’ con-
stituencies, NHS trusts, and so on.
The alignment can be complex. For
instance, to align the health statistics,
we needed to use the Google Maps
API to get the coordinates of NHS
administrative units, and then query
the Ordnance Survey data manually
using string matching for the corre-
sponding Parliamentary constituen-
cies. The time dimension adds further
complication to administrative geog-
raphy. Parliamentary constituencies
are regularly redrawn in response to
demographic change, and different
data sources deal with this in differ-
ent ways; the Ordnance Survey ad-
ministrative geography stores only
the latest classication.
However, when issues such as
changes of semantics do not occur, our
techniques allow incremental on-the-
fly updating for data consumption.
Many of the applications discussed
in this article visualize a single store,
using data harvested and processed
into RDF by EnAKTing research-
ers. This data, along with the as-
sociated visualization based on the
current contents of the store, can be
refreshed at any point. Other applica-
tions query Data.gov.uk in real time.
So, for example, See UK (http://apps.
seme4.com/see-uk/) imports the UK’s
monthly crime data into its store, and
the view is always of the latest gures
(see Figure 1).
The Value of Place
for Linking Datasets
Geography provides an intuitive way
to align datasetsno surprise, as
governments generate PSI about the
territory over which they have juris-
diction, so the data has an implicitly
geographical dimension. The LDW is
well-stocked with geographical data;
the Geonames service manages eight
million URIs for geographical re-
sources. Therefore, where there is an
authoritative geographical knowledge
base available, as in the UK, geog-
raphy is an irresistible join point for
datasets.
3
(See the sidebar, “Related
Work on the Linked Data Web.)
In our application for EnAKTing,
the region gives context for the dis-
played data and is the central point
from which we link to the LDW. New
views of the data or concepts generate
new searches and presentations on the
basis of aggregations that make sense
in the new contexts; for instance, hav-
ing moved up the geographical hierar-
chy from a constituency to a county,
the application can present the statis-
tics (such as crimes committed) rele-
vant to the county as a whole.
This approach will not work with
some types of territory, such as Parlia-
mentary constituencies, which don’t
map easily onto the administrative
geography of the UK. Yet, if we can
Figure 1. See UK, showing relative crime gures for a ward in Southampton. The
pie chart shows comparisons between it and neighboring wards normalized by
population, and the user can select gures and comparisons for particular classes
of crime from the drop-down menu.
IS-27-03-Ohar.indd 4 5/18/12 4:32 PM

MAY/JUNE 2012 www.computer.org/intelligent 5
establish that one entity is completely
contained within another (for ex-
ample, the Parliamentary constitu-
ency of Fareham within the county of
Hampshire), we can discover relevant
data and present it to the user, ideally
exploiting existing LDW resources or
bringing more geographical resources
onto the LDW (see Figure 2).
To help with this kind of reasoning,
EnAKTing has developed a service
(http://geoservice.psi.enakting.org) to
support the discovery of geographical
resources pertaining to the UK on the
LDW by querying containment rela-
tions.
3
This service exploits knowl-
edge about instance equivalence
that is already available via corefer-
ence systems such as SameAs (http://
sameas.org). It normalizes the data,
translating the
os:contains relation
into two statements, a
h as-part and
a
part-of, to produce a structure
such as that in Figure 2, allowing the
service to infer containment via re-
sources from difference datasets us-
ing
owl:sameAs.
Reasoning Services
Geolinking services are only one kind
of reasoning needed to enrich linked
data. As another example, EnAKTing
has developed a backlinking service
O
pen government data (OGD) is becoming increas-
ingly important across the globe, although currently
most initiatives involve making data in proprietary
formats downloadable. Surveys have shown that there are
relatively few attempts to combine OGD with the linked-
data web (LDW),
1
and that Data.gov.uk and Data.gov are
unusual in their commitment to the LDW vision. Many
other important and interesting developments have been
more opportunistic, including the creative use of Open
Street Map data in the aftermath of the Haiti earthquake
in 2010, while initiatives such as the Open Government
Partnership (www.opengovpartnership.org) have begun
to spread best practices even further.
The work closest to our project aims to migrate Data.gov
to the LDW. The Tetherless World Constellation (TWC)
Linked OGD portal
2
also recognizes data-publishing stages
for OGD on the LDW:
• the catalog stage, where an inventory of datasets is
created;
• the retrieval stage, where a snapshot of the dataset’s
online data file at a point is input to a Linked OGD
converter; and
• the conversion stage, where the data is converted to RDF
in a layered manner that allows many of the conversion
issues and bottlenecks to be sidestepped.
An initial automatic conversion is done by the portal, and
enhancements such as mapping ad hoc database column
names to common properties can be done by users. Many
of its linking strategies were anticipated by the Data-gov
wiki.
3
The strategy of the LOGD portal has been to fos-
ter an LOGD community by actively engaging users with
demos.
The Data-gov wiki is a social Semantic Web platform that
has produced more than 5 billion triples, covering topics
such as government spending, environmental records, and
statistics on the cost and usage of public services. It goes
through a series of steps similar to those just outlined,
including conversion of data into RDF, enhancement and
linking by declaratively associating URIs in related contexts
(done both automatically and by hand), and designing ap-
plications and demos to address the important issue of data
consumption. The Data-gov wiki limited its efforts to well-
formed comma-separated value (CSV) les, and so was able
to sidestep several conversion issues. It also took a lightweight
approach, with a minimal and extensible conversion to
preserve the structure and content of the raw data and no
more. The TWC team did not use properties from existing
ontologies, to avoid manual moderation, but properties
used in converted RDF data were dereferenceable (that is,
accessible from their URIs) to terms in well-known ontolo-
gies (such as Friend of a Friend and Dublin Core) or RDF and
XML pages generated by Semantic MediaWiki. The Data-
gov wiki also focused on provenance, and was able to use
this as a join point, linking by derivation- and version-based
provenance associations.
Evangelos Kalampokis and his colleagues have also ex-
ploited the social Web, using OGD to enrich data mined
from social networking and microblogging sites—for exam-
ple, linking tweets from high-crime areas in the UK to the
crime data from http://police.uk for those regions.
4
The aim
of this work is to allow policy makers to assess public opin-
ion and predict public reaction. Kalampokis’s team’s linked
data architecture integrates OGD with data mined from
the social Web, to enable the collection of OGD related
to a specic set of criteria that the decision-maker provides.
The integrator integrates and stores as RDF the social data
with objective data related to the specied target group,
as well as the variables related to social data and real-world
objective facts coming from government data. The improve-
ment of the OGD comes via augmentation from the social
Web, rather than from the integration processes used by
EnAKTing and TWC LOGD. Crowdsourcing (obtaining data
from a distributed group of citizens) is clearly an important
way forward.
References
1. E. Kalampokis, E. Tambouris, and K. Tarabanis, “A Classica-
tion Scheme for Open Government Data: Towards Linking
Decentralized Data,” Int’l J. Web Engineering and Technology,
vol. 6, no. 3, Inderscience, 2011, pp. 266–285.
2. L. Ding et al., “TWC LOGD: A Portal for Linked Open Government
Data Ecosystems,: J. Web Semantics, vol. 9, no. 3, Elsevier,
2011, pp. 325–333.
3. L. Ding et al., “Data-gov Wiki: Towards Linking Government
Data,” Proc. AAAI Spring Symp. Linked Data Meets Articial
Intelligence, AAAI, 2010, pp. 3843.
4. E. Kalampokis, M. Hausenblas, and K. Tarabanis, “Combining
Social and Government Open Data for Participatory Decision-
Making,Proc. 3rd Intl Conf. eParticipation (ePart 11), LNCS6847,
Springer, 2011, pp. 3647.
Related Work on the Linked Data Web
IS-27-03-Ohar.indd 5 5/18/12 4:32 PM

6 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS
L I N K E D O P E N G O V E R N M E N T D A T A
(http://backlinks.psi.enakting.org/),
4
a generic architecture component to
support the discovery of useful links
between items across highly con-
nected data sets (directed graphs)
that direct URI resolution cannot
find. The service discovers foreign
URIsthat is, URIs X that appear
in RDF triples of the form <s, p, X>
in an RDF graph G, where domain
(X) <> domain (G). A Foreign URI
pattern discovery component crawls
the LDW, retrieving all foreign URIs
found in the datasets under consid-
eration, and then asserts new URIs
(generated using an
rdfs:seeAlso
statement with the foreign URI in the
subject position) into a backlinking
knowledge base.
When backlinking is integrated
with geolinking, the number of
URIs discovered increases by or-
ders of magnitude. For instance, al-
though backlinking on its own dis-
covers only a handful of URIs linking
to
dbpedia:Hampshire or equiva-
lents from
owl:sameAs, with the ge-
oservice it retrieves thousands of re-
sources representing such entities as
schools in the area, CO
2
emissions,
and census details, and provides
hundreds of millions of extra links
between datasets such as DBpedia,
Geonames, and OpenlyLocal, as well
as the specic PSI datasets on which
we tested it.
The backlinking service also ex-
ploits a coreference evaluation ser-
vice developed within the Resilience
for Survivability in Information
Society Technologies (Resist) project,
SameAs, which nds URIs that iden-
tify identical things within the scope
of an application and then stores and
publishes them. Note the context-
relativity of such judgments; in contrast,
the global scope of
owl:sameAs im-
plies a globally valid identity. Instead,
Figure 2. Inferring geographical containment with the EnAKTing Geoservice. The service can use has-part and part-of
relations and owl:sameAs to infer that parliamentary constituencies Winchester and Fareham are in Hampshire, thereby giving
vital context for the linking of datasets.
http://dbpedia.org
http://data.ordnancesurvey.co.uk
http://parliament.psi.enakting.org
http://dbpedia.org
dbpedia:Hampshire crime:Hampshire
Hampshire county
Winchester
Fareham
...
dbpedia:Fareham
(UK Parliament constituency)
... ...
dbpedia:Winchester
(UK Parliament constituency)
parliament:cons-228
parliament:cons-637
os:7000000000017765
os:7000000000025157 os:7000000000025128
http://crime.psi.enakting.org
owl:sameAs
part-of
has-part
IS-27-03-Ohar.indd 6 5/18/12 4:32 PM

Citations
More filters
Journal ArticleDOI

Acceptance and use predictors of open data technologies: Drawing upon the unified theory of acceptance and use of technology

TL;DR: This analysis of the predictors that influence the acceptance and use of open data technologies can be used to stimulate the use ofOpen data technologies by showing the benefits of openData use, by creating awareness of users that they already use open data, by developing social strategies to encourage people to stimulate each other to use openData, and by decreasing the effort necessary to useopen data technologies.
BookDOI

New Horizons for a Data-Driven Economy: A Roadmap for Usage and Exploitation of Big Data in Europe

TL;DR: In this article, the authors present the Big Data Opportunity, Big Data Value Chain, Usage and Exploitation of Big Data, and A Roadmap for Big Data Research for the European Commissions BIG project.
Journal ArticleDOI

The dual effects of the Internet of Things (IoT) : A systematic review of the benefits and risks of IoT adoption by organizations

TL;DR: The results confirm the duality that gaining the benefits of IoT in asset management produces unexpected social changes that lead to structural transformation of the organization.
Journal ArticleDOI

State-of-the-art in open data research: Insights from existing literature and a research agenda

TL;DR: A review of extant literature is conducted to ascertain the current state of research on open data, and an extensive exploration for 11 types of analyses is presented: contexts, perspectives, level of analysis, research methods, the drivers, benefits, barriers, theory/model development, the most productive journals, authors, and institutions.
Journal ArticleDOI

Driving innovation through big open linked data (BOLD): Exploring antecedents using interpretive structural modelling

TL;DR: This research contributes to knowledge building through utilising interpretive structural modelling to organise nineteen factors linked to innovation using BOLD, finding that almost all the variables fall within the linkage cluster, thus having high driving and dependence powers, demonstrating the volatility of the process.
References
More filters
Journal ArticleDOI

TWC LOGD: A portal for linked open government data ecosystems

TL;DR: The Tetherless World Constellation at RPI has developed the Semantic Web-based TWC LOGD portal, an open source infrastructure supporting linked open government data production and consumption and a vibrant community portal that educates and serves the growing international open government community of developers, data curators and end users.
Journal ArticleDOI

A classification scheme for open government data: towards linking decentralised data

TL;DR: This paper proposes a classification scheme for OGD initiatives, and presents an architecture and prototype implementation for the most advanced OGD class in the scheme, which enables linking decentralised data.
Proceedings Article

Data-gov Wiki: Towards Linking Government Data.

TL;DR: This paper investigates the role of Semantic Web technologies in converting, enhancing and using linked government data and shows how government data can be inter-linked by sharing the same terms and URIs, linked to existing data sources ranging from the LOD cloud to the conventional web.
Book ChapterDOI

Unlocking the potential of public sector information with semantic web technology

TL;DR: The aim was to show to government how they can adopt Semantic Web technology for the dissemination, sharing and use of its data.
Book ChapterDOI

Combining social and government open data for participatory decision-making

TL;DR: This paper introduces a two-phased approach for supporting participatory decision-making based on the integration and analysis of social and government open data for future events prediction and presents a Web data driven architecture for the implementation of the proposed approach.
Related Papers (5)
Frequently Asked Questions (9)
Q1. What contributions have the authors mentioned in the paper "Linked open government data: lessons from data.gov.uk" ?

The use of Semantic Web standards in open government data ( OGD ) was pioneered by Advanced Knowledge Technologies ( AKT ) in a precursor to the work described here, and was reported to the UK Parliament in 2007. In a top-down political culture where the state is the service provider of first resort, the state becomes a powerful data 

although hard-to-link formats are a problem, the regulatory setting of reusability is crucial: data needs an open license to begin to count as open data. 

The perennial user interface issue is critical in this politicized context, as the ease with which ordinary citizens can access and query the data is a crucial factor for OGD’s value. 

The challenge is to manage the sudden influx ofheterogeneous data, often with minimal semantics and structure, tailored to highly specific task contexts. 

It uses predefined templates for translation into RDF, so that users must transform the spreadsheet into a template and then add metadata such as a dataset name, description, and URL. 

Bottlenecks in Exporting OGD to the LDW Much discussion about transparency has focused on the unwillingness of public service providers to surrender control of their data. 

OGD is not a rigid government IT specification, but it demands productive dialogue between data providers, users, and developers. 

Hugh Glaser is the chief architect at Seme4 and a visiting research fellow in electronics and computer science at the University of Southampton. 

Linking is also possible via linksliding; currently Geordi supports finding and asserting owl:sameAs, but future work will extend the range of properties that can be asserted.