scispace - formally typeset
Search or ask a question

Showing papers by "Nigel Shadbolt published in 2015"


Proceedings ArticleDOI
18 Apr 2015
TL;DR: A case study of the largest single platform of citizen driven data analysis projects to date, the Zooniverse is presented, by eliciting, through structured reflection, experiences of core members of its design team, which yielded four sets of themes, focusing on Task Specificity, Community Development, Task Design and Public Relations and Engagement.
Abstract: Designing an effective and sustainable citizen science (CS)project requires consideration of a great number of factors. This makes the overall process unpredictable, even when a sound, user-centred design approach is followed by an experienced team of UX designers. Moreover, when such systems are deployed, the complexity of the resulting interactions challenges any attempt to generalisation from retrospective analysis. In this paper, we present a case study of the largest single platform of citizen driven data analysis projects to date, the Zooniverse. By eliciting, through structured reflection, experiences of core members of its design team, our grounded analysis yielded four sets of themes, focusing on Task Specificity, Community Development, Task Design and Public Relations and Engagement, supported by two-to-four specific design claims each. For each, we propose a set of design claims (DCs), drawing comparisons to the literature on crowdsourcing and online communities to contextualise our findings.

85 citations


Proceedings ArticleDOI
18 May 2015
TL;DR: This work defines a predictive model for estimating the most appropriate incentives for individual workers, based on their previous contributions, that allows for a personalised game experience and shows that gamification leads to better accuracy and lower costs than conventional approaches that use only monetary incentives.
Abstract: Crowdsourcing via paid microtasks has been successfully applied in a plethora of domains and tasks. Previous efforts for making such crowdsourcing more effective have considered aspects as diverse as task and workflow design, spam detection, quality control, and pricing models. Our work expands upon such efforts by examining the potential of adding gamification to microtask interfaces as a means of improving both worker engagement and effectiveness. We run a series of experiments in image labeling, one of the most common use cases for microtask crowdsourcing, and analyse worker behavior in terms of number of images completed, quality of annotations compared against a gold standard, and response to financial and game-specific rewards. Each experiment studies these parameters in two settings: one based on a state-of-the-art, non-gamified task on CrowdFlower and another one using an alternative interface incorporating several game elements. Our findings show that gamification leads to better accuracy and lower costs than conventional approaches that use only monetary incentives. In addition, it seems to make paid microtask work more rewarding and engaging, especially when sociality features are introduced. Following these initial insights, we define a predictive model for estimating the most appropriate incentives for individual workers, based on their previous contributions. This allows us to build a personalised game experience, with gains seen on the volume and quality of work completed.

70 citations


Book Chapter
22 May 2015
TL;DR: This work states that knowledge elicitation is a sub-process of knowledge acquisition (which deals with the acquisition or capture of knowledge from any source), and knowledge acquisition is, in turn, a sub of knowledge engineering (which is a discipline that has evolved to support the whole process of specifying, developing and deploying knowledge-based systems).
Abstract: Introduction Knowledge elicitation consists of a set of techniques and methods that attempt to elicit the knowledge of a domain expert1, typically through some form of direct interaction with the expert. Knowledge elicitation is a sub-process of knowledge acquisition (which deals with the acquisition or capture of knowledge from any source), and knowledge acquisition is, in turn, a sub-process of knowledge engineering (which is a discipline that has evolved to support the whole process of specifying, developing and deploying knowledge-based systems).

52 citations


Book ChapterDOI
20 Jul 2015
TL;DR: This survey surveys a large number of state-of-the-art techniques of deanonymisation achieved in various methods and on different types of data, and proposes a framework to guide a thorough analysis and classifications.
Abstract: The problem of disclosing private anonymous data has become increasingly serious particularly with the possibility of carrying out deanonymisation attacks on publishing data. The related work available in the literature is inadequate in terms of the number of techniques analysed, and is limited to certain contexts such as Online Social Networks. We survey a large number of state-of-the-art techniques of deanonymisation achieved in various methods and on different types of data. Our aim is to build a comprehensive understanding about the problem. For this survey, we propose a framework to guide a thorough analysis and classifications. We are interested in classifying deanonymisation approaches based on type and source of auxiliary information and on the structure of target datasets. Moreover, potential attacks, threats and some suggested assistive techniques are identified. This can inform the research in gaining an understanding of the deanonymisation problem and assist in the advancement of privacy protection.

23 citations


Proceedings ArticleDOI
28 Jun 2015
TL;DR: This paper uses a survey to examine ways in which people fabricate, omit or alter the truth online, and concludes that lying may be essential to maintaining a humane online society.
Abstract: Portraying matters as other than they truly are is an important part of everyday human communication. In this paper, we use a survey to examine ways in which people fabricate, omit or alter the truth online. Many reasons are found, including creative expression, hiding sensitive information, role-playing, and avoiding harassment or discrimination. The results suggest lying is often used for benign purposes, and we conclude that its use may be essential to maintaining a humane online society.

17 citations


Book ChapterDOI
31 May 2015
TL;DR: The findings show that crowd workers are adept at recognizing people, locations, and implicitly identified entities within shorter microposts, which are expected to lead to the design of more advanced NER pipelines, informing the way in which tweets are chosen to be outsourced or processed by automatic tools.
Abstract: This paper explores the factors that influence the human component in hybrid approaches to named entity recognition NER in microblogs, which combine state-of-the-art automatic techniques with human and crowd computing. We identify a set of content and crowdsourcing-related features number of entities in a post, types of entities, skipped true-positive posts, average time spent to complete the tasks, and interaction with the user interface and analyse their impact on the accuracy of the results and the timeliness of their delivery. Using CrowdFlower and a simple, custom built gamified NER tool we run experiments on three datasets from related literature and a fourth newly annotated corpus. Our findings show that crowd workers are adept at recognizing people, locations, and implicitly identified entities within shorter microposts. We expect them to lead to the design of more advanced NER pipelines, informing the way in which tweets are chosen to be outsourced or processed by automatic tools. Experimental results are published as JSON-LD for further use by the research community.

15 citations


Proceedings ArticleDOI
18 May 2015
TL;DR: This paper presents a generic information cascade model that exploits only the temporal order of information sharing activities, combined with inherent properties of the shared information resources, applied to data from the world's largest online citizen science platform Zooniverse.
Abstract: This paper is an attempt to lay out foundations for a general theory of coincidence in information spaces such as the World Wide Web, expanding on existing work on bursty structures in document streams and information cascades. We elaborate on the hypothesis that every resource that is published in an information space, enters a temporary interaction with another resource once a unique explicit or implicit reference between the two is found. This thought is motivated by Erwin Shroedingers notion of entanglement between quantum systems. We present a generic information cascade model that exploits only the temporal order of information sharing activities, combined with inherent properties of the shared information resources. The approach was applied to data from the world's largest online citizen science platform Zooniverse and we report about findings of this case study.

14 citations


Proceedings ArticleDOI
25 Aug 2015
TL;DR: This paper applies a method for constructing cascades of information co-occurrence, which is suitable to trace emergent structures in information in scenarios where rich contextual features are unavailable, to analyse information dissemination patterns across the active online citizen science project Planet Hunters.
Abstract: In this paper, we investigate a method for constructing cascades of information co-occurrence, which is suitable to trace emergent structures in information in scenarios where rich contextual features are unavailable Our method relies only on the temporal order of content-sharing activities, and intrinsic properties of the shared content itself We apply this method to analyse information dissemination patterns across the active online citizen science project Planet Hunters, a part of the Zooniverse platform Our results lend insight into both structural and informational properties of different types of identifiers that can be used and combined to construct cascades In particular, significant differences are found in the structural properties of information cascades when hashtags as used as cascade identifiers, compared with other content features We also explain apparent local information losses in cascades in terms of information obsolescence and cascade divergence; eg, when a cascade branches into multiple, divergent cascades with combined capacity equal to the original

14 citations


Proceedings ArticleDOI
18 May 2015
TL;DR: This paper uses a survey to examine ways in which people fabricate, omit or alter the truth online, and concludes that lying may be essential to maintaining a humane online society.
Abstract: Portraying matters as other than they truly are is an important part of everyday human communication. In this paper, we use a survey to examine ways in which people fabricate, omit or alter the truth online. Many reasons are found, including creative expression, hiding sensitive information, role-playing, and avoiding harassment or discrimination. The results suggest lying is often used for benign purposes, and we conclude that its use may be essential to maintaining a humane online society.

11 citations


Proceedings ArticleDOI
28 Jun 2015
TL;DR: A set of behavioural characteristics are described which identify different types of players within the EyeWire platform, which facilitate player interaction and communication alongside completing the gamified scientific task.
Abstract: Citizen science is changing the process of scientific knowledge discovery. Successful projects rely on an active and able collection of volunteers. In order to attract, and sustain citizen scientists, designers are faced with the task of transforming complex scientific tasks into something accessible, interesting, and hopefully, engaging. In this paper, we examine the citizen science game EyeWire. Our analysis draws up a dataset of over 4,000,000 completed game and 885,000 chat entries, made by over 90,000 players. The analysis provides a detailed understanding of how features of the system facilitate player interaction and communication alongside completing the gamified scientific task. Based on the analysis we describe a set of behavioural characteristics which identify different types of players within the EyeWire platform.

10 citations


Proceedings ArticleDOI
28 Feb 2015
TL;DR: This paper elaborate a thesis about the computational capability embodied in information sharing activities that happen on the Web, which is term socio-technical computation, reflecting not only explicitly conditional activities but also the organic potential residing in information in the Web.
Abstract: Motivated by the significant amount of successful collaborative problem solving activity on the Web, we ask: Can the accumulated information propagation behavior on the Web be conceived as a giant machine, and reasoned about accordingly? In this paper we elaborate a thesis about the computational capability embodied in information sharing activities that happen on the Web, which we term socio-technical computation, reflecting not only explicitly conditional activities but also the organic potential residing in information on the Web.

Proceedings ArticleDOI
18 May 2015
TL;DR: This short paper presents the technical design of a prototype social machine platform, INDX, which realises both of the key design needs for building decentralised social machines: that of supporting heterogeneous social apps and multiple, separable user identities.
Abstract: Personal Data Stores are among the many efforts that are currently underway to try to re-decentralise the Web, and to bring more control and data management and storage capability under the control of the user. Few of these architectures, however, have considered the needs of supporting decentralised social software from the user's perspective. In this short paper, we present the results of our design exercise, focusing on two key design needs for building decentralised social machines: that of supporting heterogeneous social apps and multiple, separable user identities. We then present the technical design of a prototype social machine platform, INDX, which realises both of these requirements, and a prototype heterogeneous microblogging application which demonstrates its capabilities.

01 Jan 2015
TL;DR: A consensus report outlining a menu of privacy “bridges” that can be built to bring the European Union and the United States closer together is prepared, aimed at providing a framework of practical options that advance strong, globally-accepted privacy values in a manner that respects the substantive and procedural differences between the two jurisdictions.
Abstract: The EU and US share a common commitment to privacy protection as a cornerstone of democracy. Following the Treaty of Lisbon, data privacy is a fundamental right that the European Union must proactively guarantee. In the United States, data privacy derives from constitutional protections in the First, Fourth and Fifth Amendment as well as federal and state statute, consumer protection law and common law. The ultimate goal of effective privacy protection is shared. However, current friction between the two legal systems poses challenges to realizing privacy and the free flow of information across the Atlantic. Recent expansion of online surveillance practices underline these challenges. Over nine months, the group prepared a consensus report outlining a menu of privacy “bridges” that can be built to bring the European Union and the United States closer together. The efforts are aimed at providing a framework of practical options that advance strong, globally-accepted privacy values in a manner that respects the substantive and procedural differences between the two jurisdictions. The report will be presented at the 2015 International Conference of Privacy and Data Protection Commissioners, which the Dutch Data Protection Authority will host in Amsterdam on 28-29 October 2015.

Proceedings ArticleDOI
18 May 2015
TL;DR: This paper examines WikiProjects, an emergent, community-driven feature of Wikipedia, and analysis revealed that per WikiProject, the number of article and talk contributions are increasing, as are the numbers of new Wikipedians contributing to individual Wiki project.
Abstract: In this paper we examine WikiProjects, an emergent, community-driven feature of Wikipedia. We analysed 3.2 million Wikipedia articles associated with 618 active Wikipedia projects. The dataset contained the logs of over 115 million article revisions and 15 million talk entries both representing the activity of 15 million unique Wikipedians altogether. Our analysis revealed that per WikiProject, the number of article and talk contributions are increasing, as are the number of new Wikipedians contributing to individual WikiProjects. Based on these findings we consider how studying Wikipedia from a sub-community level may provide a means to measure Wikipedia activity.

Journal ArticleDOI
TL;DR: The right to be de-indexed should be understood in the context of moves to improve communication with data subjects and support subjects’ autonomy, particularly within the notice and consent regime; understanding the role of obscurity of information, and undermining the current binary assumption that information is either public or not.
Abstract: This paper examines the recent Google Spain ruling establishing a right to de-indexing based on existing rights to data protection. This ruling has had a divisive effect on the relations between the EU and the US, but this article argues that we should understand the right to de-indexing in the context of: (i) moves to improve communication with data subjects and support subjects’ autonomy, particularly within the notice and consent regime; (ii) understanding the role of obscurity of information, and undermining the current binary assumption that information is either public or not; and (iii) moves to improve the quality of search engines’ output. If we do this, then the right to be de-indexed (and possibly other types of ‘right to be forgotten’) could become a point of contact between the EU and US privacy regimes, not a point of conflict.

30 Jun 2015
TL;DR: It is argued that providing false information on occasion is a common strategy online and offline for people to protect their privacy and determine their representation in the world, and some empirical findings to that effect are discussed.
Abstract: In this position paper, we discuss legal and technical aspects of protecting privacy using Personal Data Management Architectures (PDMAs), which include, but are not limited to Personal Data Stores and Personal Information Management Services. We argue that providing false information on occasion is a common strategy online and offline for people to protect their privacy and determine their representation in the world, and we discuss some empirical findings to that effect. We describe a potential, and technically-feasible, ecosystem of digital practices and technologies to facilitate this practice, and consider what legal frameworks would be required to support it