scispace - formally typeset
Search or ask a question
Author

Matthew Lease

Other affiliations: University of Washington, Brown University, Intel  ...read more
Bio: Matthew Lease is an academic researcher from University of Texas at Austin. The author has contributed to research in topics: Crowdsourcing & Relevance (information retrieval). The author has an hindex of 36, co-authored 149 publications receiving 5098 citations. Previous affiliations of Matthew Lease include University of Washington & Brown University.


Papers
More filters
Proceedings ArticleDOI
23 Feb 2013
TL;DR: This paper outlines a framework that will enable crowd work that is complex, collaborative, and sustainable, and lays out research challenges in twelve major areas: workflow, task assignment, hierarchy, real-time response, synchronous collaboration, quality control, crowds guiding AIs, AIs guiding crowds, platforms, job design, reputation, and motivation.
Abstract: Paid crowd work offers remarkable opportunities for improving productivity, social mobility, and the global economy by engaging a geographically distributed workforce to complete complex tasks on demand and at scale. But it is also possible that crowd work will fail to achieve its potential, focusing on assembly-line piecework. Can we foresee a future crowd workplace in which we would want our children to participate? This paper frames the major challenges that stand in the way of this goal. Drawing on theory from organizational behavior and distributed computing, as well as direct feedback from workers, we outline a framework that will enable crowd work that is complex, collaborative, and sustainable. The framework lays out research challenges in twelve major areas: workflow, task assignment, hierarchy, real-time response, synchronous collaboration, quality control, crowds guiding AIs, AIs guiding crowds, platforms, job design, reputation, and motivation.

836 citations

Posted Content
TL;DR: In this paper, the authors outline a framework that will enable crowd work that is complex, collaborative, and sustainable, and lay out research challenges in twelve major areas: workflow, task assignment, hierarchy, real-time response, synchronous collaboration, quality control, crowds guiding AIs, AIs guiding crowds, platforms, job design, reputation, and motivation.
Abstract: Paid crowd work offers remarkable opportunities for improving productivity, social mobility, and the global economy by engaging a geographically distributed workforce to complete complex tasks on demand and at scale. But it is also possible that crowd work will fail to achieve its potential, focusing on assembly-line piecework. Can we foresee a future crowd workplace in which we would want our children to participate? This paper frames the major challenges that stand in the way of this goal. Drawing on theory from organizational behavior and distributed computing, as well as direct feedback from workers, we outline a framework that will enable crowd work that is complex, collaborative, and sustainable. The framework lays out research challenges in twelve major areas: workflow, task assignment, hierarchy, real-time response, synchronous collaboration, quality control, crowds guiding AIs, AIs guiding crowds, platforms, job design, reputation, and motivation.

803 citations

Proceedings ArticleDOI
11 Nov 2013
TL;DR: This work provides a thorough grounding of IR-based bug localization research in fundamental IR theoretical and empirical knowledge and practice and presents BLUiR, which embodies this insight, requires only the source code and bug reports, and takes advantage of bug similarity data if available.
Abstract: Locating bugs is important, difficult, and expensive, particularly for large-scale systems To address this, natural language information retrieval techniques are increasingly being used to suggest potential faulty source files given bug reports While these techniques are very scalable, in practice their effectiveness remains low in accurately localizing bugs to a small number of files Our key insight is that structured information retrieval based on code constructs, such as class and method names, enables more accurate bug localization We present BLUiR, which embodies this insight, requires only the source code and bug reports, and takes advantage of bug similarity data if available We build BLUiR on a proven, open source IR toolkit that anyone can use Our work provides a thorough grounding of IR-based bug localization research in fundamental IR theoretical and empirical knowledge and practice We evaluate BLUiR on four open source projects with approximately 3,400 bugs Results show that BLUiR matches or outperforms a current state-of-the-art tool across applications considered, even when BLUiR does not use bug similarity data used by the other tool

356 citations

Proceedings Article
03 Nov 2013
TL;DR: SQUARE, an open source shared task framework including benchmark datasets, defined tasks, standard metrics, and reference implementations with empirical results for several popular methods, is presented.
Abstract: While many statistical consensus methods now exist, relatively little comparative benchmarking and integration of techniques has made it increasingly difficult to determine the current state-of-the-art, to evaluate the relative benefit of new methods, to understand where specific problems merit greater attention, and to measure field progress over time. To make such comparative evaluation easier for everyone, we present SQUARE, an open source shared task framework including benchmark datasets, defined tasks, standard metrics, and reference implementations with empirical results for several popular methods. In addition to measuring performance on a variety of public, real crowd datasets, the benchmark also varies supervision and noise by manipulating training size and labeling error. We envision SQUARE as dynamic and continually evolving, with new datasets and reference implementations being added according to community needs and interest. We invite community contributions and participation.

200 citations

Proceedings Article
06 Jun 2010
TL;DR: While results are largely inconclusive, they identify important obstacles encountered, lessons learned, related work, and interesting ideas for future investigation.
Abstract: We investigate human factors involved in designing effective Human Intelligence Tasks (HITs) for Amazon's Mechanical Turk. In particular, we assess document relevance to search queries via MTurk in order to evaluate search engine accuracy. Our study varies four human factors and measures resulting experimental outcomes of cost, time, and accuracy of the assessments. While results are largely inconclusive, we identify important obstacles encountered, lessons learned, related work, and interesting ideas for future investigation. Experimental data is also made publicly available for further study by the community.

153 citations


Cited by
More filters
01 Jan 2016
TL;DR: The using multivariate statistics is universally compatible with any devices to read, allowing you to get the most less latency time to download any of the authors' books like this one.
Abstract: Thank you for downloading using multivariate statistics. As you may know, people have look hundreds times for their favorite novels like this using multivariate statistics, but end up in infectious downloads. Rather than reading a good book with a cup of tea in the afternoon, instead they juggled with some harmful bugs inside their laptop. using multivariate statistics is available in our digital library an online access to it is set as public so you can download it instantly. Our books collection saves in multiple locations, allowing you to get the most less latency time to download any of our books like this one. Merely said, the using multivariate statistics is universally compatible with any devices to read.

14,604 citations

Journal ArticleDOI
01 Apr 1988-Nature
TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.
Abstract: Deposits of clastic carbonate-dominated (calciclastic) sedimentary slope systems in the rock record have been identified mostly as linearly-consistent carbonate apron deposits, even though most ancient clastic carbonate slope deposits fit the submarine fan systems better. Calciclastic submarine fans are consequently rarely described and are poorly understood. Subsequently, very little is known especially in mud-dominated calciclastic submarine fan systems. Presented in this study are a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) that reveals a >250 m thick calciturbidite complex deposited in a calciclastic submarine fan setting. Seven facies are recognised from core and thin section characterisation and are grouped into three carbonate turbidite sequences. They include: 1) Calciturbidites, comprising mostly of highto low-density, wavy-laminated bioclast-rich facies; 2) low-density densite mudstones which are characterised by planar laminated and unlaminated muddominated facies; and 3) Calcidebrites which are muddy or hyper-concentrated debrisflow deposits occurring as poorly-sorted, chaotic, mud-supported floatstones. These

9,929 citations

Proceedings ArticleDOI
22 Jul 2006
TL;DR: This work introduces structural correspondence learning to automatically induce correspondences among features from different domains in order to adapt existing models from a resource-rich source domain to aresource-poor target domain.
Abstract: Discriminative learning methods are widely used in natural language processing. These methods work best when their training and test data are drawn from the same distribution. For many NLP tasks, however, we are confronted with new domains in which labeled data is scarce or non-existent. In such cases, we seek to adapt existing models from a resource-rich source domain to a resource-poor target domain. We introduce structural correspondence learning to automatically induce correspondences among features from different domains. We test our technique on part of speech tagging and show performance gains for varying amounts of source and target training data, as well as improvements in target domain parsing accuracy using our improved tagger.

1,672 citations

Journal ArticleDOI
TL;DR: In this article, existing definitions of crowdsourcing are analysed to extract common elements and to establish the basic characteristics of any crowdsourcing initiative.
Abstract: 'Crowdsourcing' is a relatively recent concept that encompasses many practices. This diversity leads to the blurring of the limits of crowdsourcing that may be identified virtually with any type of internet-based collaborative activity, such as co-creation or user innovation. Varying definitions of crowdsourcing exist, and therefore some authors present certain specific examples of crowdsourcing as paradigmatic, while others present the same examples as the opposite. In this article, existing definitions of crowdsourcing are analysed to extract common elements and to establish the basic characteristics of any crowdsourcing initiative. Based on these existing definitions, an exhaustive and consistent definition for crowdsourcing is presented and contrasted in 11 cases.

1,616 citations