scispace - formally typeset
Search or ask a question

Showing papers by "Michael S. Bernstein published in 2015"


Journal ArticleDOI
TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Abstract: The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the 5 years of the challenge, and propose future directions and improvements.

30,811 citations



Proceedings ArticleDOI
07 Jun 2015
TL;DR: A conditional random field model that reasons about possible groundings of scene graphs to test images and shows that the full model can be used to improve object localization compared to baseline methods and outperforms retrieval methods that use only objects or low-level image features.
Abstract: This paper develops a novel framework for semantic image retrieval based on the notion of a scene graph. Our scene graphs represent objects (“man”, “boat”), attributes of objects (“boat is white”) and relationships between objects (“man standing on boat”). We use these scene graphs as queries to retrieve semantically related images. To this end, we design a conditional random field model that reasons about possible groundings of scene graphs to test images. The likelihoods of these groundings are used as ranking scores for retrieval. We introduce a novel dataset of 5,000 human-generated scene graphs grounded to images and use this dataset to evaluate our method for image retrieval. In particular, we evaluate retrieval using full scene graphs and small scene subgraphs, and show that our method outperforms retrieval methods that use only objects or low-level image features. In addition, we show that our full model can be used to improve object localization compared to baseline methods.

1,006 citations


Proceedings ArticleDOI
18 Apr 2015
TL;DR: Dynamo, a platform to support the Mechanical Turk community in forming publics around issues and then mobilizing, finds that collective action publics tread a precariously narrow path between the twin perils of stalling and friction, balancing with each step between losing momentum and flaring into acrimony.
Abstract: By lowering the costs of communication, the web promises to enable distributed collectives to act around shared issues. However, many collective action efforts never succeed: while the web's affordances make it easy to gather, these same decentralizing characteristics impede any focus towards action. In this paper, we study challenges to collective action efforts through the lens of online labor by engaging with Amazon Mechanical Turk workers. Through a year of ethnographic fieldwork, we sought to understand online workers' unique barriers to collective action. We then created Dynamo, a platform to support the Mechanical Turk community in forming publics around issues and then mobilizing. We found that collective action publics tread a precariously narrow path between the twin perils of stalling and friction, balancing with each step between losing momentum and flaring into acrimony. However, specially structured labor to maintain efforts' forward motion can help such publics take action.

255 citations


Proceedings ArticleDOI
14 Mar 2015
TL;DR: PeerStudio is introduced, an assessment platform that leverages the large number of students' peers in online classes to enable rapid feedback on in-progress work and demonstrates how large classes can leverage their scale to encourage mastery through rapid feedback and revision.
Abstract: Rapid feedback is a core component of mastery learning, but feedback on open-ended work requires days or weeks in most classes today This paper introduces PeerStudio, an assessment platform that leverages the large number of students' peers in online classes to enable rapid feedback on in-progress work Students submit their draft, give rubric-based feedback on two peers' drafts, and then receive peer feedback Students can integrate the feedback and repeat this process as often as they desire In MOOC deployments, the median student received feedback in just twenty minutes Rapid feedback on in-progress work improves course outcomes: in a controlled experiment, students' final grades improved when feedback was delivered quickly, but not if delayed by 24 hours More than 3,600 students have used PeerStudio in eight classes, both massive and in-person This research demonstrates how large classes can leverage their scale to encourage mastery through rapid feedback and revision

172 citations


Proceedings ArticleDOI
28 Feb 2015
TL;DR: In this article, a hybrid crowd-machine learning classifier is proposed, which uses the crowd to suggest predictive features and label data, and then weights these features using machine learning to produce models that are accurate and use human-understandable features.
Abstract: We present hybrid crowd-machine learning classifiers: classification models that start with a written description of a learning goal, use the crowd to suggest predictive features and label data, and then weigh these features using machine learning to produce models that are accurate and use human-understandable features. These hybrid classifiers enable fast prototyping of machine learning models that can improve on both algorithm performance and human judgment, and accomplish tasks where automated feature extraction is not yet feasible. Flock, an interactive machine learning platform, instantiates this approach. To generate informative features, Flock asks the crowd to compare paired examples, an approach inspired by analogical encoding. The crowd's efforts can be focused on specific subsets of the input space where machine-extracted features are not predictive, or instead used to partition the input space and improve algorithm performance in subregions of the space. An evaluation on six prediction tasks, ranging from detecting deception to differentiating impressionist artists, demonstrated that aggregating crowd features improves upon both asking the crowd for a direct prediction and off-the-shelf machine learning features by over 10%. Further, hybrid systems that use both crowd-nominated and machine-extracted features can outperform those that use either in isolation.

133 citations


Journal ArticleDOI
TL;DR: S soylent, a word processing interface that enables writers to call on Mechanical Turk workers to shorten, proofread, and otherwise edit parts of their documents on demand, and the Find-Fix-Verify crowd programming pattern, which splits tasks into a series of generation and review stages.
Abstract: This paper introduces architectural and interaction patterns for integrating crowdsourced human contributions directly into user interfaces. We focus on writing and editing, complex endeavors that span many levels of conceptual and pragmatic activity. Authoring tools offer help with pragmatics, but for higher-level help, writers commonly turn to other people. We thus present Soylent, a word processing interface that enables writers to call on Mechanical Turk workers to shorten, proofread, and otherwise edit parts of their documents on demand. To improve worker quality, we introduce the Find-Fix-Verify crowd programming pattern, which splits tasks into a series of generation and review stages. Evaluation studies demonstrate the feasibility of crowdsourced editing and investigate questions of reliability, cost, wait time, and work time for edits.

133 citations


Book
13 Nov 2015
TL;DR: In this paper, the authors report on the latest research in the study of collective intelligence, laying out a shared set of research challenges from a variety of disciplinary and methodological perspectives, including computer science, biology, economics, and psychology.
Abstract: Intelligence does not arise only in individual brains; it also arises in groups of individuals. This is collective intelligence: groups of individuals acting collectively in ways that seem intelligent. In recent years, a new kind of collective intelligence has emerged: interconnected groups of people and computers, collectively doing intelligent things. Today these groups are engaged in tasks that range from writing software to predicting the results of presidential elections. This volume reports on the latest research in the study of collective intelligence, laying out a shared set of research challenges from a variety of disciplinary and methodological perspectives. Taken together, these essays -- by leading researchers from such fields as computer science, biology, economics, and psychology -- lay the foundation for a new multidisciplinary field. Each essay describes the work on collective intelligence in a particular discipline -- for example, economics and the study of markets; biology and research on emergent behavior in ant colonies; human-computer interaction and artificial intelligence; and cognitive psychology and the "wisdom of crowds" effect. Other areas in social science covered include social psychology, organizational theory, law, and communications. Contributors Eytan Adar, Ishani Aggarwal, Yochai Benkler, Michael S. Bernstein, Jeffrey P. Bigham, Jonathan Bragg, Deborah M. Gordon, Benjamin Mako Hill, Christopher H. Lin, Andrew W. Lo, Thomas W. Malone, Mausam, Brent Miller, Aaron Shaw, Mark Steyvers, Daniel S. Weld, Anita Williams Woolley

125 citations


Proceedings ArticleDOI
18 Apr 2015
TL;DR: It is found that breaking these tasks into microtasks results in longer overall task completion times, but higher quality outcomes and a better experience that may be more resilient to interruptions, suggesting that microt tasks can help people complete high quality work in interruption-driven environments.
Abstract: A large, seemingly overwhelming task can sometimes be transformed into a set of smaller, more manageable microtasks that can each be accomplished independently. For example, it may be hard to subjectively rank a large set of photographs, but easy to sort them in spare moments by making many pairwise comparisons. In crowdsourcing systems, microtasking enables unskilled workers with limited commitment to work together to complete tasks they would not be able to do individually. We explore the costs and benefits of decomposing macrotasks into microtasks for three task categories: arithmetic, sorting, and transcription. We find that breaking these tasks into microtasks results in longer overall task completion times, but higher quality outcomes and a better experience that may be more resilient to interruptions. These results suggest that microtasks can help people complete high quality work in interruption-driven environments.

123 citations


Proceedings ArticleDOI
28 Feb 2015
TL;DR: This work challenges the view that online classes are useful only when in-person classes are unavailable and demonstrates how diverse online classrooms can create benefits that are largely unavailable in a traditional classroom.
Abstract: Massive online classes are global and diverse. How can we harness this diversity to improve engagement and learning? Currently, though enrollments are high, students' interactions with each other are minimal: most are alone together. This isolation is particularly disappointing given that a global community is a major draw of online classes. This paper illustrates the potential of leveraging geographic diversity in massive online classes. We connect students from around the world through small-group video discussions. Our peer discussion system, Talkabout, has connected over 5,000 students in fourteen online classes. Three studies with 2,670 students from two classes found that globally diverse discussions boost student performance and engagement: the more geographically diverse the discussion group, the better the students performed on later quizzes. Through this work, we challenge the view that online classes are useful only when in-person classes are unavailable. Instead, we demonstrate how diverse online classrooms can create benefits that are largely unavailable in a traditional classroom.

111 citations


Proceedings ArticleDOI
18 Apr 2015
TL;DR: This paper introduces crowdsourcing techniques and tools for prototyping interactive systems in the time it takes to describe the idea, and introduces Powering Apparition, the first self-coordinated, real-time crowdsourcing infrastructure.
Abstract: Prototyping allows designers to quickly iterate and gather feedback, but the time it takes to create even a Wizard-of-Oz prototype reduces the utility of the process. In this paper, we introduce crowdsourcing techniques and tools for prototyping interactive systems in the time it takes to describe the idea. Our Apparition system uses paid microtask crowds to make even hard-to-automate functions work immediately, allowing more fluid prototyping of interfaces that contain interactive elements and complex behaviors. As users sketch their interface and describe it aloud in natural language, crowd workers and sketch recognition algorithms translate the input into user interface elements, add animations, and provide Wizard-of-Oz functionality. We discuss how design teams can use our approach to reflect on prototypes or begin user studies within seconds, and how, over time, Apparition prototypes can become fully-implemented versions of the systems they simulate. Powering Apparition is the first self-coordinated, real-time crowdsourcing infrastructure. We anchor this infrastructure on a new, lightweight write-locking mechanism that workers can use to signal their intentions to each other.

Proceedings ArticleDOI
18 Apr 2015
TL;DR: This work proposes a data-driven effort metric, ETA (error-time area), that can be used to determine a task's fair price and validate the ETA metric on ten common crowdsourcing tasks, finding that ETA closely tracks how workers would rank these tasks by effort.
Abstract: Crowdsourcing systems lack effective measures of the effort required to complete each task. Without knowing how much time workers need to execute a task well, requesters struggle to accurately structure and price their work. Objective measures of effort could better help workers identify tasks that are worth their time. We propose a data-driven effort metric, ETA (error-time area), that can be used to determine a task's fair price. It empirically models the relationship between time and error rate by manipulating the time that workers have to complete a task. ETA reports the area under the error-time curve as a continuous metric of worker effort. The curve's 10th percentile is also interpretable as the minimum time most workers require to complete the task without error, which can be used to price the task. We validate the ETA metric on ten common crowdsourcing tasks, including tagging, transcription, and search, and find that ETA closely tracks how workers would rank these tasks by effort. We also demonstrate how ETA allows requesters to rapidly iterate on task designs and measure whether the changes improve worker efficiency. Our findings can facilitate the process of designing, pricing, and allocating crowdsourcing tasks.

Proceedings ArticleDOI
06 Nov 2015
TL;DR: This paper proposes a prototype task to improve the work quality and open-governance model to achieve equitable representation and envisage Daemo will enable workers to build sustainable careers and provide requesters with timely, quality labor for their businesses.
Abstract: Crowdsourcing marketplaces provide opportunities for autonomous and collaborative professional work as well as social engagement. However, in these marketplaces, workers feel disrespected due to unreasonable rejections and low payments, whereas requesters do not trust the results they receive. The lack of trust and uneven distribution of power among workers and requesters have raised serious concerns about sustainability of these marketplaces. To address the challenges of trust and power, this paper introduces Daemo, a self-governed crowdsourcing marketplace. We propose a prototype task to improve the work quality and open-governance model to achieve equitable representation. We envisage Daemo will enable workers to build sustainable careers and provide requesters with timely, quality labor for their businesses.

Proceedings ArticleDOI
18 Apr 2015
TL;DR: Motif, a mobile video storytelling application that allows users to construct video stories by combining storytelling patterns extracted from stories created by experts, and encourages capturing shots with story structure and narrative goals in mind.
Abstract: Creating personal narratives helps people build meaning around their experiences. However, novices lack the knowledge and experience to create stories with strong narrative structure. Current storytelling tools often structure novice work through templates, enforcing a linear creative process that asks novices for materials they may not have. In this paper, we propose scaffolding creative work using storytelling patterns extracted from stories created by experts. Patterns are modular sets of related camera shots that expert videographers commonly use to achieve a specific narrative function. After identifying a set of patterns from high-quality storytelling videos, we created Motif, a mobile video storytelling application that allows users to construct video stories by combining these patterns. By making existing solutions used by experts available to novices, we encourage capturing shots with story structure and narrative goals in mind. In a controlled study where we asked participants to create travel video stories, videos created with patterns conveyed stronger narrative structure and were considered higher quality by expert evaluators than videos created without patterns.

Posted Content
TL;DR: A semantic link between textual descriptions and image regions by object-level grounding enables a new type of QA with visual answers, in addition to textual answers used in previous work, and proposes a novel LSTM model with spatial attention to tackle the 7W QA tasks.
Abstract: We have seen great progress in basic perceptual tasks such as object recognition and detection However, AI models still fail to match humans in high-level vision tasks due to the lack of capacities for deeper reasoning Recently the new task of visual question answering (QA) has been proposed to evaluate a model's capacity for deep image understanding Previous works have established a loose, global association between QA sentences and images However, many questions and answers, in practice, relate to local regions in the images We establish a semantic link between textual descriptions and image regions by object-level grounding It enables a new type of QA with visual answers, in addition to textual answers used in previous work We study the visual QA tasks in a grounded setting with a large collection of 7W multiple-choice QA pairs Furthermore, we evaluate human performance and several baseline models on the QA tasks Finally, we propose a novel LSTM model with spatial attention to tackle the 7W QA tasks

Proceedings ArticleDOI
14 Mar 2015
TL;DR: This paper articulates and addresses three adoption challenges for global-scale peer learning, and measures efficacy through sign-up and participation rates and the structure and duration of student interactions.
Abstract: When students work with peers, they learn more actively, build richer knowledge structures, and connect material to their lives. However, not every peer learning experience online sees successful adoption. This paper articulates and addresses three adoption challenges for global-scale peer learning. First, peer interactions struggle to bootstrap critical mass. However, class incentives can signal importance and spur initial usage. Second, online classes have limited peer visibility and awareness, so students often feel alone even when surrounded by peers. We find that highlighting interdependence and strengthening norms can mitigate this issue. Third, teachers can readily access "big" aggregate data but not "thick" contextual data that helps build intuitions, so software should guide teachers' scaffolding of peer interactions. We illustrate these challenges through studying 8,500 students' usage of two peer learning platforms, Talkabout and PeerStudio. This paper measures efficacy through sign-up and participation rates and the structure and duration of student interactions.

Proceedings ArticleDOI
22 Jun 2015
TL;DR: This work develops a taxonomy of creative activities that people engage in when they aim to succeed and proposes flipping the value of failure in creativity tools from something to avoid to something to pursue actively to better support experiences of failure for novices.
Abstract: Creative tools today strive to amplify our ability to create high-quality work. However, experiencing failure is also an important part of mastering creative skills. While experts have developed strategies for engaging in risky experiments and learning from mistakes, novices lack the experience and mindset needed to use failures as opportunities for growth. Current tools intimidate the unsure novice, as they are designed around showcasing success or critiquing finished work, rather than providing safe spaces for experimentation. To better support experiences of failure for novices, we instead propose flipping the value of failure in creativity tools from something to avoid to something to pursue actively. To do this, we develop a taxonomy of creative activities that people engage in when they aim to succeed. We then invert this taxonomy to derive a new set of creative activities where deliberate failure can provide a path towards creative confidence. Lastly, we envision possible creativity support tools as examples of the potential value of supporting activities where failure is encouraged and showcased.

Proceedings ArticleDOI
14 Mar 2015
TL;DR: The pilot results, drawn from a Coursera class, suggest that participants prefer to exchange information with their peers using personal stories and connecting stories with curriculum increases participant engagement.
Abstract: Student discussions over video in massive classes allow students to explore course content, share personal experiences and get feedback on their ideas. However, such discussions frequently turn into casual conversations without focusing on the curriculum and the learning objectives. This short paper explores whether students can achieve multiple learning objectives by solving challenges collaboratively during discussions. We introduce `think-pair-share' technique for video discussions. Our pilot results, drawn from a Coursera class, suggest that participants prefer to exchange information with their peers using personal stories and connecting stories with curriculum increases participant engagement.

Proceedings ArticleDOI
18 Apr 2015
TL;DR: This paper bootstrap a knowledge graph of human activities by text mining a large dataset of modern fiction on the web, and demonstrates an Augur-enhanced video game world in which non-player characters follow realistic patterns of behavior, interact with their environment and each other, and respond to the user's behavior.
Abstract: People engage with thousands of situations, activities, and objects on a daily basis. Hand-coding this knowledge into interactive systems is prohibitively labor-intensive, but fiction captures a vast number of human lives in moment to moment detail. In this paper, we bootstrap a knowledge graph of human activities by text mining a large dataset of modern fiction on the web. Our knowledge graph, Augur, describes human actions over time as conditioned by nearby locations, people, and objects. Applications can use this graph to react to human behavior in a data-driven way. We demonstrate an Augur-enhanced video game world in which non-player characters follow realistic patterns of behavior, interact with their environment and each other, and respond to the user's behavior.

Posted Content
TL;DR: SentenceRacer, an online game that gathers and verifies descriptions of images at no cost, and generates annotations of higher quality than those generated on Amazon Mechanical Turk (AMT).
Abstract: Recently datasets that contain sentence descriptions of images have enabled models that can automatically generate image captions. However, collecting these datasets are still very expensive. Here, we present SentenceRacer, an online game that gathers and verifies descriptions of images at no cost. Similar to the game hangman, players compete to uncover words in a sentence that ultimately describes an image. SentenceRacer both generates and verifies that the sentences are accurate descriptions. We show that SentenceRacer generates annotations of higher quality than those generated on Amazon Mechanical Turk (AMT).