Top 25 papers published by Michael S. Bernstein from Stanford University in 2016

Posted Content•

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

[...]

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, Michael S. Bernstein, Fei-Fei Li - Show less +8 more

23 Feb 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: The Visual Genome dataset is presented, which contains over 108K images where each image has an average of $$35$$35 objects, $$26$$26 attributes, and $$21$$21 pairwise relationships between objects, and represents the densest and largest dataset of image descriptions, objects, attributes, relationships, and question answer pairs.

...read moreread less

Abstract: Despite progress in perceptual tasks such as image classification, computers still perform poorly on cognitive tasks such as image description and question answering. Cognition is core to tasks that involve not just recognizing, but reasoning about our visual world. However, models used to tackle the rich content in images for cognitive tasks are still being trained using the same datasets designed for perceptual tasks. To achieve success at cognitive tasks, models need to understand the interactions and relationships between objects in an image. When asked "What vehicle is the person riding?", computers will need to identify the objects in an image as well as the relationships riding(man, carriage) and pulling(horse, carriage) in order to answer correctly that "the person is riding a horse-drawn carriage". In this paper, we present the Visual Genome dataset to enable the modeling of such relationships. We collect dense annotations of objects, attributes, and relationships within each image to learn these models. Specifically, our dataset contains over 100K images where each image has an average of 21 objects, 18 attributes, and 18 pairwise relationships between objects. We canonicalize the objects, attributes, relationships, and noun phrases in region descriptions and questions answer pairs to WordNet synsets. Together, these annotations represent the densest and largest dataset of image descriptions, objects, attributes, relationships, and question answers.

...read moreread less

1,663 citations

Book Chapter•DOI•

Visual Relationship Detection with Language Priors

[...]

Cewu Lu¹, Ranjay Krishna¹, Michael S. Bernstein¹, Li Fei-Fei¹•Institutions (1)

Stanford University¹

08 Oct 2016

TL;DR: In this article, the authors propose a model that uses this insight to train visual models for objects and predicates individually and later combines them together to predict multiple relationships per image and localize the objects in the predicted relationships as bounding boxes in the image.

...read moreread less

Abstract: Visual relationships capture a wide variety of interactions between pairs of objects in images (e.g. “man riding bicycle” and “man pushing bicycle”). Consequently, the set of possible relationships is extremely large and it is difficult to obtain sufficient training examples for all possible relationships. Because of this limitation, previous work on visual relationship detection has concentrated on predicting only a handful of relationships. Though most relationships are infrequent, their objects (e.g. “man” and “bicycle”) and predicates (e.g. “riding” and “pushing”) independently occur more frequently. We propose a model that uses this insight to train visual models for objects and predicates individually and later combines them together to predict multiple relationships per image. We improve on prior work by leveraging language priors from semantic word embeddings to finetune the likelihood of a predicted relationship. Our model can scale to predict thousands of types of relationships from a few examples. Additionally, we localize the objects in the predicted relationships as bounding boxes in the image. We further demonstrate that understanding relationships can improve content based image retrieval.

...read moreread less

893 citations

Proceedings Article•DOI•

Visual7W: Grounded Question Answering in Images

[...]

Yuke Zhu¹, Oliver Groth², Michael S. Bernstein¹, Li Fei-Fei¹•Institutions (2)

Stanford University¹, Dresden University of Technology²

01 Jun 2016

TL;DR: In this article, an LSTM model with spatial attention was proposed to tackle the 7W QA task, which enables a new type of QA with visual answers, in addition to textual answers used in previous work.

...read moreread less

Abstract: We have seen great progress in basic perceptual tasks such as object recognition and detection. However, AI models still fail to match humans in high-level vision tasks due to the lack of capacities for deeper reasoning. Recently the new task of visual question answering (QA) has been proposed to evaluate a model's capacity for deep image understanding. Previous works have established a loose, global association between QA sentences and images. However, many questions and answers, in practice, relate to local regions in the images. We establish a semantic link between textual descriptions and image regions by object-level grounding. It enables a new type of QA with visual answers, in addition to textual answers used in previous work. We study the visual QA tasks in a grounded setting with a large collection of 7W multiple-choice QA pairs. Furthermore, we evaluate human performance and several baseline models on the QA tasks. Finally, we propose a novel LSTM model with spatial attention to tackle the 7W QA tasks.

...read moreread less

751 citations

Posted Content•

Visual Relationship Detection with Language Priors

[...]

Cewu Lu¹, Ranjay Krishna¹, Michael S. Bernstein¹, Li Fei-Fei¹•Institutions (1)

Stanford University¹

31 Jul 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes a model that can scale to predict thousands of types of relationships from a few examples and improves on prior work by leveraging language priors from semantic word embeddings to finetune the likelihood of a predicted relationship.

...read moreread less

Abstract: Visual relationships capture a wide variety of interactions between pairs of objects in images (e.g. "man riding bicycle" and "man pushing bicycle"). Consequently, the set of possible relationships is extremely large and it is difficult to obtain sufficient training examples for all possible relationships. Because of this limitation, previous work on visual relationship detection has concentrated on predicting only a handful of relationships. Though most relationships are infrequent, their objects (e.g. "man" and "bicycle") and predicates (e.g. "riding" and "pushing") independently occur more frequently. We propose a model that uses this insight to train visual models for objects and predicates individually and later combines them together to predict multiple relationships per image. We improve on prior work by leveraging language priors from semantic word embeddings to finetune the likelihood of a predicted relationship. Our model can scale to predict thousands of types of relationships from a few examples. Additionally, we localize the objects in the predicted relationships as bounding boxes in the image. We further demonstrate that understanding relationships can improve content based image retrieval.

...read moreread less

517 citations

Proceedings Article•DOI•

Empath: Understanding Topic Signals in Large-Scale Text

[...]

Ethan Fast¹, Binbin Chen¹, Michael S. Bernstein¹•Institutions (1)

Stanford University¹

07 May 2016

TL;DR: Empath is a tool that can generate and validate new lexical categories on demand from a small set of seed terms, which draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction.

...read moreread less

Abstract: Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence). Empath draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction. Given a small set of seed words that characterize a category, Empath uses its neural embedding to discover new related terms, then validates the category with a crowd-powered filter. Empath also analyzes text across 200 built-in, pre-validated categories we have generated from common topics in our web dataset, like neglect, government, and social media. We show that Empath's data-driven, human validated categories are highly correlated (r=0.906) with similar categories in LIWC.

...read moreread less

235 citations

Proceedings Article•DOI•

Empath: Understanding Topic Signals in Large-Scale Text

[...]

Ethan Fast¹, Binbin Chen¹, Michael S. Bernstein²•Institutions (2)

Stanford University¹, Association for Computing Machinery²

22 Feb 2016-arXiv: Computation and Language

TL;DR: Emppath as mentioned in this paper is a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence).

...read moreread less

Abstract: Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence). Empath draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction. Given a small set of seed words that characterize a category, Empath uses its neural embedding to discover new related terms, then validates the category with a crowd-powered filter. Empath also analyzes text across 200 built-in, pre-validated categories we have generated from common topics in our web dataset, like neglect, government, and social media. We show that Empath's data-driven, human validated categories are highly correlated (r=0.906) with similar categories in LIWC.

...read moreread less

227 citations

Proceedings Article•DOI•

Embracing Error to Enable Rapid Crowdsourcing

[...]

Ranjay Krishna¹, Kenji Hata¹, Stephanie Chen¹, Joshua Kravitz¹, David A. Shamma², Li Fei-Fei¹, Michael S. Bernstein¹ - Show less +3 more•Institutions (2)

Stanford University¹, Yahoo!²

07 May 2016

TL;DR: This work presents a technique that produces extremely rapid judgments for binary and categorical labels, and demonstrates that it is possible to rectify errors by randomizing task order and modeling response latency.

...read moreread less

Abstract: Microtask crowdsourcing has enabled dataset advances in social science and machine learning, but existing crowdsourcing schemes are too expensive to scale up with the expanding volume of data. To scale and widen the applicability of crowdsourcing, we present a technique that produces extremely rapid judgments for binary and categorical labels. Rather than punishing all errors, which causes workers to proceed slowly and deliberately, our technique speeds up workers' judgments to the point where errors are acceptable and even expected. We demonstrate that it is possible to rectify these errors by randomizing task order and modeling response latency. We evaluate our technique on a breadth of common labeling tasks such as image verification, word similarity, sentiment analysis and topic classification. Where prior work typically achieves a 0.25x to 1x speedup over fixed majority vote, our approach often achieves an order of magnitude (10x) speedup.

...read moreread less

95 citations

Proceedings Article•DOI•

Atelier: Repurposing Expert Crowdsourcing Tasks as Micro-internships

[...]

Ryo Suzuki¹, Niloufar Salehi², Michelle S. Lam², Juan C. Marroquin², Michael S. Bernstein² - Show less +1 more•Institutions (2)

University of Colorado Boulder¹, Stanford University²

07 May 2016

TL;DR: Atelier, a micro-internship platform that connects crowd interns with crowd mentors, guides mentor-intern pairs to break down expert crowdsourcing tasks into milestones, review intermediate output, and problem-solve together, finding that Atelier helped interns maintain forward progress and absorb best practices.

...read moreread less

Abstract: Expert crowdsourcing marketplaces have untapped potential to empower workers' career and skill development. Currently, many workers cannot afford to invest the time and sacrifice the earnings required to learn a new skill, and a lack of experience makes it difficult to get job offers even if they do. In this paper, we seek to lower the threshold to skill development by repurposing existing tasks on the marketplace as mentored, paid, real-world work experiences, which we refer to as micro-internships. We instantiate this idea in Atelier, a micro-internship platform that connects crowd interns with crowd mentors. Atelier guides mentor-intern pairs to break down expert crowdsourcing tasks into milestones, review intermediate output, and problem-solve together. We conducted a field experiment comparing Atelier's mentorship model to a non-mentored alternative on a real-world programming crowdsourcing task, finding that Atelier helped interns maintain forward progress and absorb best practices.

...read moreread less

69 citations

Proceedings Article•DOI•

Boomerang: Rebounding the Consequences of Reputation Feedback on Crowdsourcing Platforms

[...]

Snehalkumar (Neil) S. Gaikwad¹, Durim Morina², Adam Ginzberg², Catherine A. Mullings², Shirish Goyal², Dilrukshi Gamage³, Christopher Diemert, Mathias Burton, Sharon Zhou², Mark E. Whiting⁴, Karolina Ziulkoski, Alipta Ballav, Aaron Gilbee, Senadhipathige S. Niranga⁵, Vibhor Sehgal⁶, Jasmine Lin⁷, Leonardy Kristianto⁸, Angela Richmond-Fuller, Jeff Regino, Nalin Chhibber, Dinesh Majeti⁹, Sachin Sharma¹⁰, Kamila Mananova¹¹, Dinesh Dhakal¹², William Dai¹³, Victoria Purynova, Samarth Sandeep, Varshine Chandrakanthan, Tejas Sarma¹⁴, Sekandar Matin¹⁵, Ahmed Nasser¹⁶, Rohit Nistala⁴, Alexander Stolzoff¹⁷, Kristy Milland¹⁸, Vinayak Mathur¹⁹, Rajan Vaish², Michael S. Bernstein² - Show less +33 more•Institutions (19)

Massachusetts Institute of Technology¹, Stanford University², University of Moratuwa³, Carnegie Mellon University⁴, Sri Lanka Institute of Information Technology⁵, Maharaja Agrasen Institute of Technology⁶, University of Washington⁷, Lancaster University⁸, University of Houston⁹, Jamia Millia Islamia¹⁰, École Polytechnique Fédérale de Lausanne¹¹, Juniper Networks¹², University of California, Berkeley¹³, University of Mumbai¹⁴, University of Salford¹⁵, Helwan University¹⁶, University of California, San Diego¹⁷, Ryerson University¹⁸, Manipal University¹⁹

16 Oct 2016

TL;DR: Inspired by a game-theoretic notion of incentive-compatibility, Boomerang opens opportunities for interaction design to incentivize honest reporting over strategic dishonesty.

...read moreread less

Abstract: Paid crowdsourcing platforms suffer from low-quality work and unfair rejections, but paradoxically, most workers and requesters have high reputation scores. These inflated scores, which make high-quality work and workers difficult to find, stem from social pressure to avoid giving negative feedback. We introduce Boomerang, a reputation system for crowdsourcing platforms that elicits more accurate feedback by rebounding the consequences of feedback directly back onto the person who gave it. With Boomerang, requesters find that their highly-rated workers gain earliest access to their future tasks, and workers find tasks from their highly-rated requesters at the top of their task feed. Field experiments verify that Boomerang causes both workers and requesters to provide feedback that is more closely aligned with their private opinions. Inspired by a game-theoretic notion of incentive-compatibility, Boomerang opens opportunities for interaction design to incentivize honest reporting over strategic dishonesty.

...read moreread less

50 citations

Proceedings Article•DOI•

Mechanical Novel: Crowdsourcing Complex Work through Reflection and Revision

[...]

Joy Kim¹, Sarah Sterman¹, Allegra Argent Beal Cohen¹, Michael S. Bernstein¹•Institutions (1)

Stanford University¹

08 Nov 2016-arXiv: Human-Computer Interaction

TL;DR: This work proposes a technique for achieving interdependent complex goals with crowds, and embodies it in Mechanical Novel, a system that crowdsources short fiction stories on Amazon Mechanical Turk.

...read moreread less

Abstract: Crowdsourcing systems accomplish large tasks with scale and speed by breaking work down into independent parts. However, many types of complex creative work, such as fiction writing, have remained out of reach for crowds because work is tightly interdependent: changing one part of a story may trigger changes to the overall plot and vice versa. Taking inspiration from how expert authors write, we propose a technique for achieving interdependent complex goals with crowds. With this technique, the crowd loops between reflection, to select a high-level goal, and revision, to decompose that goal into low-level, actionable tasks. We embody this approach in Mechanical Novel, a system that crowdsources short fiction stories on Amazon Mechanical Turk. In a field experiment, Mechanical Novel resulted in higher-quality stories than an iterative crowdsourcing workflow. Our findings suggest that orienting crowd work around high-level goals may enable workers to coordinate their effort to accomplish complex work.

...read moreread less

42 citations

Proceedings Article•

Shirtless and Dangerous: Quantifying Linguistic Signals of Gender Bias in an Online Fiction Writing Community.

[...]

Ethan Fast¹, Tina Vachovsky¹, Michael S. Bernstein¹•Institutions (1)

Stanford University¹

29 Mar 2016

TL;DR: A technique that combines natural language processing with a crowdsourced lexicon of stereotypes to capture gender biases in fiction finds that male over-representation and traditional gender stereotypes are common throughout nearly every genre in the corpus.

...read moreread less

Abstract: Imagine a princess asleep in a castle, waiting for her prince to slay the dragon and rescue her. Tales like the famous Sleeping Beauty clearly divide up gender roles. But what about more modern stories, borne of a generation increasingly aware of social constructs like sexism and racism? Do these stories tend to reinforce gender stereotypes, or counter them? In this paper, we present a technique that combines natural language processing with a crowdsourced lexicon of stereotypes to capture gender biases in fiction. We apply this technique across 1.8 billion words of fiction from the Wattpad online writing community, investigating gender representation in stories, how male and female characters behave and are described, and how authors' use of gender stereotypes is associated with the community's ratings. We find that male over-representation and traditional gender stereotypes (e.g., dominant men and submissive women) are common throughout nearly every genre in our corpus. However, only some of these stereotypes, like sexual or violent men, are associated with highly rated stories. Finally, despite women often being the target of negative stereotypes, female authors are equally likely to write such stereotypes as men.

...read moreread less

Posted Content•

Huddler: Convening Stable and Familiar Crowd Teams Despite Unpredictable Availability

[...]

Niloufar Salehi¹, Andrew McCabe¹, Melissa Valentine¹, Michael S. Bernstein¹•Institutions (1)

Stanford University¹

26 Oct 2016-arXiv: Human-Computer Interaction

TL;DR: Huddler as discussed by the authors enables effective crowd teams with a system for workers to assemble familiar teams even under unpredictable availability and strict time constraints, utilizing a dynamic programming algorithm to optimize for highly familiar teammates when individual availability is unknown.

...read moreread less

Abstract: Distributed, parallel crowd workers can accomplish simple tasks through workflows, but teams of collaborating crowd workers are necessary for complex goals. Unfortunately, a fundamental condition for effective teams - familiarity with other members - stands in contrast to crowd work's flexible, on-demand nature. We enable effective crowd teams with Huddler, a system for workers to assemble familiar teams even under unpredictable availability and strict time constraints. Huddler utilizes a dynamic programming algorithm to optimize for highly familiar teammates when individual availability is unknown. We first present a field experiment that demonstrates the value of familiarity for crowd teams: familiar crowd teams doubled the performance of ad-hoc (unfamiliar) teams on a collaborative task. We then report a two-week field deployment wherein Huddler enabled crowd workers to convene highly familiar teams in 18 minutes on average. This research advances the goal of supporting long-term, team-based collaborations without sacrificing the flexibility of crowd work.

...read moreread less

Proceedings Article•DOI•

Mosaic: Designing Online Creative Communities for Sharing Works-in-Progress

[...]

Joy Kim¹, Maneesh Agrawala¹, Michael S. Bernstein¹•Institutions (1)

Stanford University¹

08 Nov 2016-arXiv: Human-Computer Interaction

TL;DR: Through Mosaic, it is argued that communities oriented around sharing creative process can create a collaborative environment that is beneficial for creative growth.

...read moreread less

Abstract: Online creative communities allow creators to share their work with a large audience, maximizing opportunities to showcase their work and connect with fans and peers. However, sharing in-progress work can be technically and socially challenging in environments designed for sharing completed pieces. We propose an online creative community where sharing process, rather than showcasing outcomes, is the main method of sharing creative work. Based on this, we present Mosaic---an online community where illustrators share work-in-progress snapshots showing how an artwork was completed from start to finish. In an online deployment and observational study, artists used Mosaic as a vehicle for reflecting on how they can improve their own creative process, developed a social norm of detailed feedback, and became less apprehensive of sharing early versions of artwork. Through Mosaic, we argue that communities oriented around sharing creative process can create a collaborative environment that is beneficial for creative growth.

...read moreread less

Proceedings Article•DOI•

Pay It Backward: Per-Task Payments on Crowdsourcing Platforms Reduce Productivity

[...]

Kazushi Ikeda, Michael S. Bernstein¹•Institutions (1)

Stanford University¹

07 May 2016

TL;DR: Is the pay-per-task method the right one?

...read moreread less

Abstract: Paid crowdsourcing marketplaces have gained popularity by using piecework, or payment for each microtask, to incentivize workers. This norm has remained relatively unchallenged. In this paper, we ask: is the pay-per-task method the right one? We draw on behavioral economic research to examine whether payment in bulk after every ten tasks, saving money via coupons instead of earning money, or material goods rather than money will increase the number of completed tasks. We perform a twenty-day, between-subjects field experiment (N=300) on a mobile crowdsourcing application and measure how often workers responded to a task notification to fill out a short survey under each incentive condition. Task completion rates increased when paying in bulk after ten tasks: doing so increased the odds of a response by 1.4x, translating into 8% more tasks through that single intervention. Payment with coupons instead of money produced a small negative effect on task completion rates. Material goods were the most robust to decreasing participation over time.

...read moreread less

Proceedings Article•DOI•

Augur: Mining Human Behaviors from Fiction to Power Interactive Systems

[...]

Ethan Fast¹, William McGrath¹, Pranav Rajpurkar¹, Michael S. Bernstein¹•Institutions (1)

Stanford University¹

07 May 2016

TL;DR: Augur as discussed by the authors mines a knowledge base of human behavior by analyzing more than one billion words of modern fiction and trains vector models that can predict many thousands of user activities from surrounding objects in modern contexts, such as eating, meeting with a friend, or taking a selfie.

...read moreread less

Abstract: From smart homes that prepare coffee when we wake, to phones that know not to interrupt us during important conversations, our collective visions of HCI imagine a future in which computers understand a broad range of human behaviors. Today our systems fall short of these visions, however, because this range of behaviors is too large for designers or programmers to capture manually. In this paper, we instead demonstrate it is possible to mine a broad knowledge base of human behavior by analyzing more than one billion words of modern fiction. Our resulting knowledge base, Augur, trains vector models that can predict many thousands of user activities from surrounding objects in modern contexts: for example, whether a user may be eating food, meeting with a friend, or taking a selfie. Augur uses these predictions to identify actions that people commonly take on objects in the world and estimate a user's future activities given their current situation. We demonstrate Augur-powered, activity-based systems such as a phone that silences itself when the odds of you answering it are low, and a dynamic music player that adjusts to your present activity. A field deployment of an Augur-powered wearable camera resulted in 96% recall and 71% precision on its unsupervised predictions of common daily activities. A second evaluation where human judges rated the system's predictions over a broad set of input images found that 94% were rated sensible.

...read moreread less

Proceedings Article•DOI•

Augur: Mining Human Behaviors from Fiction to Power Interactive Systems

[...]

Ethan Fast¹, William McGrath¹, Pranav Rajpurkar¹, Michael S. Bernstein¹•Institutions (1)

Stanford University¹

22 Feb 2016-arXiv: Human-Computer Interaction

TL;DR: Augur-powered, activity-based systems such as a phone that silences itself when the odds of you answering it are low, and a dynamic music player that adjusts to your present activity are demonstrated.

...read moreread less

Abstract: From smart homes that prepare coffee when we wake, to phones that know not to interrupt us during important conversations, our collective visions of HCI imagine a future in which computers understand a broad range of human behaviors. Today our systems fall short of these visions, however, because this range of behaviors is too large for designers or programmers to capture manually. In this paper, we instead demonstrate it is possible to mine a broad knowledge base of human behavior by analyzing more than one billion words of modern fiction. Our resulting knowledge base, Augur, trains vector models that can predict many thousands of user activities from surrounding objects in modern contexts: for example, whether a user may be eating food, meeting with a friend, or taking a selfie. Augur uses these predictions to identify actions that people commonly take on objects in the world and estimate a user's future activities given their current situation. We demonstrate Augur-powered, activity-based systems such as a phone that silences itself when the odds of you answering it are low, and a dynamic music player that adjusts to your present activity. A field deployment of an Augur-powered wearable camera resulted in 96% recall and 71% precision on its unsupervised predictions of common daily activities. A second evaluation where human judges rated the system's predictions over a broad set of input images found that 94% were rated sensible.

...read moreread less

Proceedings Article•DOI•

Productivity Decomposed: Getting Big Things Done with Little Microtasks

[...]

Jaime Teevan¹, Shamsi T. Iqbal¹, Carrie J. Cai², Jeffrey P. Bigham³, Michael S. Bernstein⁴, Elizabeth M. Gerber⁵ - Show less +2 more•Institutions (5)

Microsoft¹, Massachusetts Institute of Technology², Carnegie Mellon University³, Stanford University⁴, Northwestern University⁵

07 May 2016

TL;DR: This workshop brings together researchers in task decomposition, completion, and sourcing to discuss how intersections of research across these areas can pave the path for future research in this space.

...read moreread less

Abstract: It is difficult to accomplish meaningful goals with limited time and attentional resources. However, recent research has shown that concrete plans with actionable steps allow people to complete tasks better and faster. With advances in techniques that can decompose larger tasks into smaller units, we envision that a transformation from larger tasks to smaller microtasks will impact when and how people perform complex information work, enabling efficient and easy completion of tasks that currently seem challenging. In this workshop, we bring together researchers in task decomposition, completion, and sourcing. We will pursue a broad understanding of the challenges in creating, allocating, and scheduling microtasks, as well as how accomplishing these microtasks can contribute towards productivity. The goal is to discuss how intersections of research across these areas can pave the path for future research in this space.

...read moreread less

Proceedings Article•DOI•

Embracing Error to Enable Rapid Crowdsourcing

[...]

Ranjay Krishna¹, Kenji Hata¹, Stephanie Chen¹, Joshua Kravitz¹, David A. Shamma², Li Fei-Fei¹, Michael S. Bernstein¹ - Show less +3 more•Institutions (2)

Stanford University¹, Yahoo!²

14 Feb 2016-arXiv: Human-Computer Interaction

TL;DR: In this article, the authors present a technique that produces extremely rapid judgments for binary and categorical labels, rather than punishing all errors, which causes workers to proceed slowly and deliberately, and demonstrate that it is possible to rectify these errors by randomizing task order and modeling response latency.

...read moreread less

Abstract: Microtask crowdsourcing has enabled dataset advances in social science and machine learning, but existing crowdsourcing schemes are too expensive to scale up with the expanding volume of data. To scale and widen the applicability of crowdsourcing, we present a technique that produces extremely rapid judgments for binary and categorical labels. Rather than punishing all errors, which causes workers to proceed slowly and deliberately, our technique speeds up workers' judgments to the point where errors are acceptable and even expected. We demonstrate that it is possible to rectify these errors by randomizing task order and modeling response latency. We evaluate our technique on a breadth of common labeling tasks such as image verification, word similarity, sentiment analysis and topic classification. Where prior work typically achieves a 0.25x to 1x speedup over fixed majority vote, our approach often achieves an order of magnitude (10x) speedup.

...read moreread less

Proceedings Article•DOI•

Crowd Guilds: Worker-led Reputation and Feedback on Crowdsourcing Platforms

[...]

Mark E. Whiting¹, Dilrukshi Gamage², Snehalkumar (Neil) S. Gaikwad³, Aaron Gilbee, Shirish Goyal⁴, Alipta Ballav, Dinesh Majeti⁵, Nalin Chhibber, Angela Richmond-Fuller, Freddie Vargus⁶, Tejas Sarma, Varshine Chandrakanthan, Teogenes Moura⁷, Mohamed Hashim Salih, Gabriel Bayomi Tinoco Kalejaiye⁷, Adam Ginzberg⁴, Catherine A. Mullings⁴, Yoni Dayan, Kristy Milland⁸, Henrique R. Orefice⁷, Jeff Regino, Sayna Parsi⁹, Kunz Mainali¹⁰, Vibhor Sehgal¹¹, Sekandar Matin¹², Akshansh Sinha¹¹, Rajan Vaish⁴, Michael S. Bernstein⁴ - Show less +24 more•Institutions (12)

Carnegie Mellon University¹, University of Moratuwa², Massachusetts Institute of Technology³, Stanford University⁴, University of Houston⁵, Boston University⁶, University of Brasília⁷, Ryerson University⁸, University of Washington⁹, University of Texas at Austin¹⁰, Maharaja Agrasen Institute of Technology¹¹, University of Salford¹²

04 Nov 2016-arXiv: Human-Computer Interaction

TL;DR: Drawing inspiration from historical worker guilds is drawn to design and implement crowd guilds: centralized groups of crowd workers who collectively certify each other's quality through double-blind peer assessment.

...read moreread less

Abstract: Crowd workers are distributed and decentralized. While decentralization is designed to utilize independent judgment to promote high-quality results, it paradoxically undercuts behaviors and institutions that are critical to high-quality work. Reputation is one central example: crowdsourcing systems depend on reputation scores from decentralized workers and requesters, but these scores are notoriously inflated and uninformative. In this paper, we draw inspiration from historical worker guilds (e.g., in the silk trade) to design and implement crowd guilds: centralized groups of crowd workers who collectively certify each other's quality through double-blind peer assessment. A two-week field experiment compared crowd guilds to a traditional decentralized crowd work model. Crowd guilds produced reputation signals more strongly correlated with ground-truth worker quality than signals available on current crowd working platforms, and more accurate than in the traditional model.

...read moreread less

Posted Content•

Atelier: Repurposing Expert Crowdsourcing Tasks as Micro-internships

[...]

Ryo Suzuki¹, Niloufar Salehi², Michelle S. Lam², Juan C. Marroquin², Michael S. Bernstein³ - Show less +1 more•Institutions (3)

University of Colorado Boulder¹, Stanford University², Association for Computing Machinery³

22 Feb 2016-arXiv: Human-Computer Interaction

TL;DR: In this article, a micro-internship platform that connects crowd interns with crowd mentors is proposed. But it does not address the issue that many workers cannot invest the time and sacrifice the earnings required to learn a new skill, and a lack of experience makes it difficult to get job offers even if they do.

...read moreread less

Abstract: Expert crowdsourcing marketplaces have untapped potential to empower workers' career and skill development. Currently, many workers cannot afford to invest the time and sacrifice the earnings required to learn a new skill, and a lack of experience makes it difficult to get job offers even if they do. In this paper, we seek to lower the threshold to skill development by repurposing existing tasks on the marketplace as mentored, paid, real-world work experiences, which we refer to as micro-internships. We instantiate this idea in Atelier, a micro-internship platform that connects crowd interns with crowd mentors. Atelier guides mentor-intern pairs to break down expert crowdsourcing tasks into milestones, review intermediate output, and problem-solve together. We conducted a field experiment comparing Atelier's mentorship model to a non-mentored alternative on a real-world programming crowdsourcing task, finding that Atelier helped interns maintain forward progress and absorb best practices.

...read moreread less

Proceedings Article•DOI•

Meta: Enabling Programming Languages to Learn from the Crowd

[...]

Ethan Fast¹, Michael S. Bernstein¹•Institutions (1)

Stanford University¹

16 Oct 2016

TL;DR: Meta: a language extension for Python that allows programmers to share functions and track how they are used by a crowd of other programmers is introduced, finding that professional programmers are able to use Meta for complex tasks, and that Meta is able to find 44 optimizations and 5 bug fixes across the crowd.

...read moreread less

Abstract: Collectively authored programming resources such as Q&A sites and open-source libraries provide a limited window into how programs are constructed, debugged, and run. To address these limitations, we introduce Meta: a language extension for Python that allows programmers to share functions and track how they are used by a crowd of other programmers. Meta functions are shareable via URL and instrumented to record runtime data. Combining thousands of Meta functions with their collective runtime data, we demonstrate tools including an optimizer that replaces your function with a more efficient version written by someone else, an auto-patcher that saves your program from crashing by finding equivalent functions in the community, and a proactive linter that warns you when a function fails elsewhere in the community. We find that professional programmers are able to use Meta for complex tasks (creating new Meta functions that, for example, cross-validate a logistic regression), and that Meta is able to find 44 optimizations (for a 1.45 times average speedup) and 5 bug fixes across the crowd.

...read moreread less

Proceedings Article•DOI•

A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality

[...]

Kenji Hata¹, Ranjay Krishna¹, Li Fei-Fei¹, Michael S. Bernstein¹•Institutions (1)

Stanford University¹

15 Sep 2016-arXiv: Human-Computer Interaction

Abstract: Microtask crowdsourcing is increasingly critical to the creation of extremely large datasets. As a result, crowd workers spend weeks or months repeating the exact same tasks, making it necessary to understand their behavior over these long periods of time. We utilize three large, longitudinal datasets of nine million annotations collected from Amazon Mechanical Turk to examine claims that workers fatigue or satisfice over these long periods, producing lower quality work. We find that, contrary to these claims, workers are extremely stable in their quality over the entire period. To understand whether workers set their quality based on the task's requirements for acceptance, we then perform an experiment where we vary the required quality for a large crowdsourcing task. Workers did not adjust their quality based on the acceptance threshold: workers who were above the threshold continued working at their usual quality level, and workers below the threshold self-selected themselves out of the task. Capitalizing on this consistency, we demonstrate that it is possible to predict workers' long-term quality using just a glimpse of their quality on the first five tasks.

...read moreread less

Posted Content•

Shirtless and Dangerous: Quantifying Linguistic Signals of Gender Bias in an Online Fiction Writing Community

[...]

Ethan Fast¹, Tina Vachovsky¹, Michael S. Bernstein¹•Institutions (1)

Stanford University¹

29 Mar 2016-arXiv: Computation and Language

TL;DR: This paper used natural language processing with a crowdsourced lexicon of stereotypes to capture gender biases in fiction and found that male overrepresentation and traditional gender stereotypes are common throughout nearly every genre in the corpus.

...read moreread less

Abstract: Imagine a princess asleep in a castle, waiting for her prince to slay the dragon and rescue her. Tales like the famous Sleeping Beauty clearly divide up gender roles. But what about more modern stories, borne of a generation increasingly aware of social constructs like sexism and racism? Do these stories tend to reinforce gender stereotypes, or counter them? In this paper, we present a technique that combines natural language processing with a crowdsourced lexicon of stereotypes to capture gender biases in fiction. We apply this technique across 1.8 billion words of fiction from the Wattpad online writing community, investigating gender representation in stories, how male and female characters behave and are described, and how authors' use of gender stereotypes is associated with the community's ratings. We find that male over-representation and traditional gender stereotypes (e.g., dominant men and submissive women) are common throughout nearly every genre in our corpus. However, only some of these stereotypes, like sexual or violent men, are associated with highly rated stories. Finally, despite women often being the target of negative stereotypes, female authors are equally likely to write such stereotypes as men.

...read moreread less

Book Chapter•DOI•

Designing Scalable and Sustainable Peer Interactions Online

[...]

Chinmay Kulkarni¹, Yasmine Kotturi², Michael S. Bernstein¹, Scott R. Klemmer²•Institutions (2)

Stanford University¹, University of California, San Diego²

01 Jan 2016

TL;DR: This research demonstrates how large classes can leverage their scale to encourage mastery through rapid feedback and revision, and suggests secret ingredients to make such peer interactions sustainable at scale.

...read moreread less

Abstract: When students work with peers, they learn more actively, build richer knowledge structures, and connect material to their lives. However, not every peer learning experience online sees successful adoption. This chapter first introduces PeerStudio, an assessment platform that leverages the large number of students’ peers in online classes to enable rapid feedback on in-progress work. Students submit their draft, give rubric-based feedback on two peers’ drafts, and then receive peer feedback. Students can integrate the feedback and repeat this process as often as they desire. PeerStudio demonstrates how rapid feedback on in-progress work improves course outcomes. We then articulate and address three adoption and implementation challenges for peer learning platforms such as PeerStudio. First, peer interactions struggle to bootstrap critical mass. However, class incentives can signal importance and spur initial usage. Second, online classes have limited peer visibility and awareness, so students often feel alone even when surrounded by peers. We find that highlighting interdependence and strengthening norms can mitigate this issue. Third, teachers can readily access “big” aggregate data but not “thick” contextual data that helps build intuitions, so software should guide teachers’ scaffolding of peer interactions. We illustrate these challenges through studying 8500 students’ usage of PeerStudio and another peer learning platform: Talkabout. Efficacy is measured through sign-up and participation rates and the structure and duration of student interactions. This research demonstrates how large classes can leverage their scale to encourage mastery through rapid feedback and revision, and suggests secret ingredients to make such peer interactions sustainable at scale.

...read moreread less

Proceedings Article•DOI•

The Web Is Flat: The Inflation of Uncommon Experiences Online

[...]

Danaë Metaxa-Kakavouli¹, Gili Rusak¹, Jaime Teevan², Michael S. Bernstein¹•Institutions (2)

Stanford University¹, Microsoft²

07 May 2016

TL;DR: It is found that rare experiences are inflated on the web (by a median of 7x), while common experiences are deflated (byA median of 0.7x).

...read moreread less

Abstract: People populate the web with content relevant to their lives, content that millions of others rely on for information and guidance. However, the web is not a perfect rep- resentation of lived experience: some topics appear in greater proportion online than their true incidence in our population, while others are deflated. This paper presents a large scale data collection study of this phenomenon. We collect webpages about 21 topics of interest capturing roughly 200,000 webpages, and then compare each topic's popularity to representative national surveys as ground truth. We find that rare experiences are inflated on the web (by a median of 7x), while common experiences are deflated (by a median of 0.7x). We call this phenomenon novelty bias.

...read moreread less

Showing papers by "Michael S. Bernstein published in 2016"