scispace - formally typeset
Search or ask a question

Showing papers by "Nigel Shadbolt published in 2014"


Book
01 Oct 2014
TL;DR: Exposing the invasion of their privacy from CCTVs to blogs, The Spy in the Coffee Machine explores what—if anything—the authors can do to prevent it from disappearing forever in the digital age, and provides readers with a much needed wake-up call to the benefits and dangers of this new technology.
Abstract: We are entering a new state of global hypersurveillance. As we increasingly resort to technology for our work and play, our electronic activity leaves behind digital footprints that can be used to track our movements. In our cars, telephones, even our coffee machines, tiny computers communicating wirelessly via the Internet can serve as miniature witnesses, forming powerful networks whose emergent behaviour can be very complex, intelligent, and invasive. The question is: how much of an infringement on privacy are they? Exposing the invasion of our privacy from CCTVs to blogs, The Spy in the Coffee Machine explores what—if anything—we can do to prevent it from disappearing forever in the digital age, and provides readers with a much needed wake-up call to the benefits and dangers of this new technology.

60 citations


Book ChapterDOI
01 Oct 2014
TL;DR: Within the context of the World Wide Web, the emergence of a rich range of technologies that support both collaboration and distributed processing have been witnessed, including Wikipedia, the world’s largest online encyclopedia.
Abstract: Within the context of the World Wide Web, we have witnessed the emergence of a rich range of technologies that support both collaboration and distributed processing. Applications such as Wikipedia, for instance, have demonstrated the power and potential of the Web to facilitate the pooling of geographically dispersed knowledge assets. The result has been the creation of the world’s largest online encyclopedia, available for free in more than 200 languages for everyone to access and use.

50 citations


Proceedings Article
16 May 2014
TL;DR: A quantitative analysis of ten citizen science projects hosted on the Zooniverse platform, using a data set of over 50 million activity records and more than 250,000 users, collected between December 2010 and July 2013, identifies project characteristics, most importantly the subject domain and the duration of a project.
Abstract: We conducted a quantitative analysis of ten citizen science projects hosted on the Zooniverse platform, using a data set of over 50 million activity records and more than 250,000 users, collected between December 2010 and July 2013. We examined the level of participation of users in Zooniverse discussion forums in relation to their contributions toward the completion of scientific (micro-)tasks. As Zooniverse is home to a multitude of projects, we were also interested in the emergence of cross-projects effects, and identified those project characteristics, most importantly the subject domain and the duration of a project. We also looked into the adoption of expert terminology, showing that this phenomenon is dependent on the scientific domain which a project addresses but also affected by how the communication features are actually used by a community. This is the first study of this kind in this increasingly important class of online community, and its insights will inform the design and further development of the Zooniverse platform, and of citizen science systems as a whole.

48 citations


Journal Article
TL;DR: Participation in online discussion within and across 10 distinct projects of a shared citizen science platform, the Zooniverse, are examined; specifically, whether participation in online Discussion influences task completion withinand across 10 different projects of this platform.
Abstract: Ramine Tinati, Elena Simperl, Markus Luczak-Roesch, Max Van Kleek, Nigel Shadbolt, University ofSouthampton1. INTRODUCTIONOnline citizen science can be seen as a form of collective intelligence Levy [1997] and Woolley et al.´[2010] in which the wisdom of the crowd is applied to the Web to advance scientific knowledge [Prestop-nik and Crowston 2012] Thus far, online citizen science projects [Bonney et al. 2009] have appliedmillions of volunteers to solving problems in a wide array of scientific domains, ranging from the clas-sification of galaxies [Fortson et al. 2011] to the completion of protein folding networks [Khatib et al.2011].Central to many of these projects are online messaging or discussion facilities designed to allowvolunteers to ask one another questions and advice. Such facilities have in many cases yielded sub-stantial, dedicated self-sustaining online communities. In this paper, we examine participation in suchcommunities; specifically, whether participation in online discussion influences task completion withinand across 10 distinct projects of a shared citizen science platform, the Zooniverse

17 citations


Proceedings ArticleDOI
07 Apr 2014
TL;DR: This paper proposes that a possible path towards surmounting the inevitable obstacle of personal privacy towards such a goal, is to keep data with individuals, under their own control, while enabling them to participate in Web Observatory-style analyses in situ.
Abstract: Web Observatories aim to develop techniques and methods to allow researchers to interrogate and answer questions about society through the multitudes of digital traces people now create. In this paper, we propose that a possible path towards surmounting the inevitable obstacle of personal privacy towards such a goal, is to keep data with individuals, under their own control, while enabling them to participate in Web Observatory-style analyses in situ. We discuss the kinds of applications such a global, distributed, linked network of Personal Web Observatories might have, a few of the many challenges that must be resolved towards realising such an architecture in practice, and finally, our work towards a fundamental reference building block of such a network.

12 citations



Proceedings ArticleDOI
26 Apr 2014
TL;DR: A direct-manipulation interface permitting the consolidated annotation and revision of activity data from multiple devices is designed and a pilot study of this interface found that users understood readily how to use the features offered, and valued the ability to edit, yet preserve the provenance of their data.
Abstract: The many and varied personal activity trackers on the market have the potential to provide unprecedented detail and insight on our everyday activities. However, effective use and interpretation of data from them can be challenging due to common issues. Such issues include false readings due to sensing approaches taken, or missing data arising from a number of different causes. In order to understand user perceptions on this topic, we performed a preliminary survey, which found that users desired the ability to annotate, retroactively repair, and compare their data. Based on insights from this survey, we designed a direct-manipulation interface permitting the consolidated annotation and revision of activity data from multiple devices. A pilot study of this interface found that users understood readily how to use the features offered, and valued the ability to edit, yet preserve the provenance of their data.

8 citations


Journal ArticleDOI
TL;DR: A technical review of semantic search methods used to support text‐based search over formal Semantic Web knowledge bases and reflective examples from the literature are presented, which should appeal to readers interested in a deeper perspective on the various methods and systems implemented in the outlined literature.
Abstract: This article provides a technical review of semantic search methods used to support text-based search over formal Semantic Web knowledge bases. Our focus is on ranking methods and auxiliary processes explored by existing semantic search systems, outlined within broad areas of classification. We present reflective examples from the literature in some detail, which should appeal to readers interested in a deeper perspective on the various methods and systems implemented in the outlined literature. The presentation covers graph exploration and propagation methods, adaptations of classic probabilistic retrieval models, and query-independent link analysis via flexible extensions to the PageRank algorithm. Future research directions are discussed, including development of more cohesive retrieval models to unlock further potentials and uses, data indexing schemes, integration with user interfaces, and building community consensus for more systematic evaluation and gradual development.

8 citations


Book ChapterDOI
15 Jul 2014
TL;DR: This paper presents two procedures for the assessment of the Open Government Data reliability, one based on a comparison between open and closed data, and the other based on analysis of open data only.
Abstract: Open Government Data often contain information that, in more or less detail, regard private citizens. For this reason, before publishing them, public authorities manipulate data to remove any sensitive information while trying to preserve their reliability. This paper addresses the lack of tools aimed at measuring the reliability of these data. We present two procedures for the assessment of the Open Government Data reliability, one based on a comparison between open and closed data, and the other based on analysis of open data only. We evaluate the procedures over data from the data.police.uk website and from the Hampshire Police Constabulary in the United Kingdom. The procedures effectively allow estimating the reliability of open data and, actually, their reliability is high even though they are aggregated and smoothed.

7 citations


Book ChapterDOI
11 Nov 2014
TL;DR: The results show that popularity works as a proxy for generality in at most 77 of cases, but that this can be improved to 81% using the improved approach proposed, which will translate to higher quality tag hierarchy structures.
Abstract: Building taxonomies for Web content manually is costly and timeconsuming. An alternative is to allow users to create folksonomies: collective social classifications. However, folksonomies have inconsistent structures and their use for searching and browsing is limited. Approaches have been proposed for acquiring implicit hierarchical structures from folksonomies, but these approaches suffer from the “generality-popularity” problem, in that they assume that popularity is a proxy for generality (that high level taxonomic terms will occur more often than low level ones). In this paper we test this assumption, and propose an improved approach (based on the Heymann-Benz algorithm) for tackling this problem by direction checking relations against a corpus of text. Our results show that popularity works as a proxy for generality in at most 77 of cases, but that this can be improved to 81% using our approach. This improvement will translate to higher quality tag hierarchy structures.

7 citations


Proceedings ArticleDOI
04 Sep 2014
TL;DR: This paper proposes methods to generate a sample of representative microposts by discovering tweets that are likely to refer to new entities, able to significantly speed-up the semantic analysis process by discarding retweets, tweets without pre-identifiable entities, as well similar and redundant tweets, while retaining information content.
Abstract: In this paper, we address the problem of finding Named Entities in very large micropost datasets. We propose methods to generate a sample of representative microposts by discovering tweets that are likely to refer to new entities. Our approach is able to significantly speed-up the semantic analysis process by discarding retweets, tweets without pre-identifiable entities, as well similar and redundant tweets, while retaining information content.We apply the approach on a corpus of 1:4 billion microposts, using the IE services of AlchemyAPI, Calais, and Zemanta to identify more than 700,000 unique entities. For the evaluation we compare runtime and number of entities extracted based on the full and the downscaled version of a micropost set. We are able to demonstrate that for datasets of more than 10 million tweets we can achieve a reduction in size of more than 80% while maintaining up to 60% coverage on unique entities cumulatively discovered by the three IE tools.We publish the resulting Twitter metadata as Linked Data using SIOC and an extension of the NERD core ontology.

Book ChapterDOI
01 Jan 2014
TL;DR: In this article, the authors explore a set of obfuscation techniques which may help to redress the balance of power when sharing personal data, and return agency and choice to users of online services.
Abstract: The use of personal data has incredible potential to benefit both society and individuals, through increased understanding of behaviour, communication and support for emerging forms of socialisation and connectedness. However, there are risks associated with disclosing personal information, and present systems show a systematic asymmetry between the subjects of the data and those who control and manage the way that data is propagated and used. This leads to a tension between a desire to engage with online society and enjoy its benefits on one hand, and a distrust of those with whom the data is shared on the other. In this chapter, we explore a set of obfuscation techniques which may help to redress the balance of power when sharing personal data, and return agency and choice to users of online services.

Proceedings ArticleDOI
02 Jun 2014
TL;DR: This paper describes how a system using Linked Data principles was built to bring in data from Web 2.0 sites (LinkedIn, Salesforce), and other external business sites such as OpenCorporates, linking these together with pertinent internal British Telecommunications enterprise data into that enterprise data space.
Abstract: The new world of big data, of the LOD cloud, of the app economy, and of social media means that organisations no longer own, much less control, all the data they need to make the best informed business decisions. In this paper, we describe how we built a system using Linked Data principles to bring in data from Web 2.0 sites (LinkedIn, Salesforce), and other external business sites such as OpenCorporates, linking these together with pertinent internal British Telecommunications enterprise data into that enterprise data space. We describe the challenges faced during the implementation, which include sourcing the datasets, finding the appropriate "join points" from the individual datasets, as well as developing the client application used for data publication. We describe our solutions to these challenges and discuss the design decisions made. We conclude by drawing some general principles from this work.

Proceedings ArticleDOI
01 Dec 2014
TL;DR: This paper aims to draw a research roadmap that will help to identify areas of significant concern, where the affordances of linked data align with the requirements for de-anonymisation and re-identification.
Abstract: The objective of this roadmap is to summarise the state-of-the-art and to identify critical challenges for privacy in Linked Data. Our research particularly focuses on examining how the problem of data deanonymisation fits within the context of Linked Data. This draws attention to the fact that publishing data and linking them with other data (to achieve the Data Web vision) is also a significant threat to privacy. Interconnecting data with RDF from heterogeneous resources provides meaningful and valuable information in machine-understandable forms, but it may also offer fewer barriers for deanonymisation attacks to be achieved successfully, and potentially with full automation. Therefore, it is vital to keep both points of view into consideration; leveraging the Linked Data in the Web whilst also ensuring privacy when it is desired. In this paper, we aim to draw a research roadmap that will help to identify areas of significant concern, where the affordances of linked data align with the requirements for de-anonymisation and re-identification.

26 May 2014
TL;DR: A Personal Dataspace Support Platform (PDSP) is described as a set of services to provide a unified view over the user’s data, and to enable new and more complex workflows over it.
Abstract: In this paper we argue that the space of personal data is a dataspace as defined by Franklin et al. We define a personal dataspace, as the space of all personal data belonging to a user, and we describe the logical components of the dataspace. We describe a Personal Dataspace Support Platform (PDSP) as a set of services to provide a unified view over the user’s data, and to enable new and more complex workflows over it. We show the differences from a DSSP to a PDSP, and how the latter can be realized using Web protocols and Linked APIs.

01 Nov 2014
TL;DR: This article proposed an improved approach based on the Heymann-Benz algorithm to tackle this problem by direction checking relations against a corpus of text and found that popularity works as a proxy for generality in at most 77% of cases, but this can be improved to 81% using their approach.
Abstract: Building taxonomies for Web content manually is costly and time-consuming An alternative is to allow users to create folksonomies: collective social classifications However, folksonomies have inconsistent structures and their use for searching and browsing is limited Approaches have been proposed for acquiring implicit hierarchical structures from folksonomies, but these approaches suffer from the “generality-popularity” problem, in that they assume that popularity is a proxy for generality (that high level taxonomic terms will occur more often than low level ones) In this paper we test this assumption, and propose an improved approach (based on the Heymann-Benz algorithm) for tackling this problem by direction checking relations against a corpus of text Our results show that popularity works as a proxy for generality in at most 77% of cases, but that this can be improved to 81% using our approach This improvement will translate to higher quality tag hierarchy structures