scispace - formally typeset
Search or ask a question

Showing papers by "Michael S. Bernstein published in 2014"


Posted Content
TL;DR: The creation of this benchmark dataset and the advances in object recognition that have been possible as a result are described, and the state-of-the-art computer vision accuracy with human accuracy is compared.
Abstract: The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the five years of the challenge, and propose future directions and improvements.

519 citations


Proceedings ArticleDOI
05 Oct 2014
TL;DR: It is demonstrated that Foundry and flash teams enable crowdsourcing of a broad class of goals including design prototyping, course development, and film animation, in half the work time of traditional self-managed teams.
Abstract: We introduce flash teams, a framework for dynamically assembling and managing paid experts from the crowd. Flash teams advance a vision of expert crowd work that accomplishes complex, interdependent goals such as engineering and design. These teams consist of sequences of linked modular tasks and handoffs that can be computationally managed. Interactive systems reason about and manipulate these teams' structures: for example, flash teams can be recombined to form larger organizations and authored automatically in response to a user's request. Flash teams can also hire more people elastically in reaction to task needs, and pipeline intermediate output to accelerate completion times. To enable flash teams, we present Foundry, an end-user authoring platform and runtime manager. Foundry allows users to author modular tasks, then manages teams through handoffs of intermediate work. We demonstrate that Foundry and flash teams enable crowdsourcing of a broad class of goals including design prototyping, course development, and film animation, in half the work time of traditional self-managed teams.

214 citations


Proceedings ArticleDOI
26 Apr 2014
TL;DR: An algorithm that exploits correlation, hierarchy, and sparsity of the label distribution is proposed that results in up to 6x reduction in human computation time compared to the naive method of querying a human annotator for the presence of every object in every image.
Abstract: We study strategies for scalable multi-label annotation, or for efficiently acquiring multiple labels from humans for a collection of items. We propose an algorithm that exploits correlation, hierarchy, and sparsity of the label distribution. A case study of labeling 200 objects using 20,000 images demonstrates the effectiveness of our approach. The algorithm results in up to 6x reduction in human computation time compared to the naive method of querying a human annotator for the presence of every object in every image.

162 citations


Proceedings ArticleDOI
07 Apr 2014
TL;DR: The PlanOut tool as discussed by the authors separates experimental design from application code, allowing the experimenter to concisely describe experimental designs, whether common "A/B tests" and factorial designs, or more complex designs involving conditional logic or multiple experimental units.
Abstract: Online experiments are widely used to compare specific design alternatives, but they can also be used to produce generalizable knowledge and inform strategic decision making. Doing so often requires sophisticated experimental designs, iterative refinement, and careful logging and analysis. Few tools exist that support these needs. We thus introduce a language for online field experiments called PlanOut. PlanOut separates experimental design from application code, allowing the experimenter to concisely describe experimental designs, whether common "A/B tests" and factorial designs, or more complex designs involving conditional logic or multiple experimental units. These latter designs are often useful for understanding causal mechanisms involved in user behaviors. We demonstrate how experiments from the literature can be implemented in PlanOut, and describe two large field experiments conducted on Facebook with PlanOut. For common scenarios in which experiments are run iteratively and in parallel, we introduce a namespaced management system that encourages sound experimental practice.

142 citations


Posted Content
TL;DR: A language for online field experiments called PlanOut separates experimental design from application code, allowing the experimenter to concisely describe experimental designs, whether common "A/B tests" and factorial designs, or more complex designs involving conditional logic or multiple experimental units.
Abstract: Online experiments are widely used to compare specific design alternatives, but they can also be used to produce generalizable knowledge and inform strategic decision making. Doing so often requires sophisticated experimental designs, iterative refinement, and careful logging and analysis. Few tools exist that support these needs. We thus introduce a language for online field experiments called PlanOut. PlanOut separates experimental design from application code, allowing the experimenter to concisely describe experimental designs, whether common "A/B tests" and factorial designs, or more complex designs involving conditional logic or multiple experimental units. These latter designs are often useful for understanding causal mechanisms involved in user behaviors. We demonstrate how experiments from the literature can be implemented in PlanOut, and describe two large field experiments conducted on Facebook with PlanOut. For common scenarios in which experiments are run iteratively and in parallel, we introduce a namespaced management system that encourages sound experimental practice.

138 citations


Journal ArticleDOI
TL;DR: This work introduces perceptual kernels: distance matrices derived from aggregate perceptual judgments, which represent perceptual differences between and within visual variables in a reusable form that is directly applicable to visualization evaluation and automated design.
Abstract: Visualization design can benefit from careful consideration of perception, as different assignments of visual encoding variables such as color, shape and size affect how viewers interpret data. In this work, we introduce perceptual kernels: distance ma- trices derived from aggregate perceptual judgments. Perceptual kernels represent perceptual differences between and within visual variables in a reusable form that is directly applicable to visualization evaluation and automated design. We report results from crowd- sourced experiments to estimate kernels for color, shape, size and combinations thereof. We analyze kernels estimated using five different judgment types—including Likert ratings among pairs, ordinal triplet comparisons, and manual spatial arrangement—and compare them to existing perceptual models. We derive recommendations for collecting perceptual similarities, and then demonstrate how the resulting kernels can be applied to automate visualization design decisions. Visual encoding decisions are central to visualization design. As view- ers' interpretation of data may shift across encodings, it is important to understand how choices of visual encoding variables such as color, shape, size—and their combinations—affect graphical perception. One way to evaluate these effects is to measure the perceived simi- larities(orconversely, distances) between visual variables. Webroadly refer to subjective measures of judged similarity as perceptual dis- tances. In this context, a perceptual kernel is the distance matrix of aggregated pairwise perceptual distances. These measures quantify the effects of alternative encodings and thereby help create visualiza- tions that better reflect structures in data. Figure 1a shows a perceptual kernel for a set of symbols; distances are visualized using grayscale values, with darker cells indicating higher similarity. The prominent clusters suggest that users will perceive similarities among shapes that may or may not mirror encoded data values. Perceptual kernels can also benefit automated visualization design. Typically, automated design methods (27) leverage an effectiveness ranking of visual encoding variables with respect to data types (nom- inal, ordinal, quantitative). Once a visual variable is chosen, these methods provide little guidance on how to best pair data values with visual elements, instead relying on default palettes for variables such as color and shape. Perceptual kernels provide a means for computing optimized assignments to visual variables whose perceived differences are congruent with underlying distances among data points. In short, perceptual kernels enable the direct application of empirical percep- tion data within visualization tools. In this work, we contribute the results of crowdsourced experiments to estimate perceptual kernels for visual encoding variables of shape, size, color and combinations thereof. There are alternative ways of eliciting judged similarities among visual variables. We compare a variety of judgment types: Likert ratings among pairs, ordinal triplet comparisons, and manual spatial arrangement. We also assess the re- sulting kernels via comparisons to existing perceptual models. Wefind that ordinal triplet matching judgments provide the most consistent re- sults, albeit with higher time and money costs than pairwise ratings or spatial arrangement. We then demonstrate how perceptual kernels can be applied to improve visualization design through automatic palette optimization and by providing distances for visual embedding (8) of data points into visual spaces.

122 citations


Proceedings ArticleDOI
26 Apr 2014
TL;DR: This work introduces Twitch, a mobile phone application that asks users to make a micro-contribution each time they unlock their phone, and presents twitch crowdsourcing: crowdsourcing via quick contributions that can be completed in one or two seconds.
Abstract: To lower the threshold to participation in crowdsourcing, we present twitch crowdsourcing: crowdsourcing via quick contributions that can be completed in one or two seconds. We introduce Twitch, a mobile phone application that asks users to make a micro-contribution each time they unlock their phone. Twitch takes advantage of the common habit of turning to the mobile phone in spare moments. Twitch crowdsourcing activities span goals such as authoring a census of local human activity, rating stock photos, and extracting structured data from Wikipedia pages. We report a field deployment of Twitch where 82 users made 11,240 crowdsourcing contributions as they used their phone in the course of everyday life. The median Twitch activity took just 1.6 seconds, incurring no statistically distinguishable costs to unlock speed or cognitive load compared to a standard slide-to-unlock interface.

105 citations


Proceedings ArticleDOI
15 Feb 2014
TL;DR: This work suggests that asymmetric creative contributions may support a broad new class of creative collaborations, where a leader directs the high-level vision for a story and articulates creative constraints for the crowd.
Abstract: In story writing, the diverse perspectives of the crowd could support an author's search for the perfect character, setting, or plot. However, structuring crowd collaboration is challenging. Too little structure leads to unfocused, sprawling narratives, and too much structure stifles creativity. Motivated by the idea that individual creative leaders and the crowd have complementary creative strengths, we present an approach where a leader directs the high-level vision for a story and articulates creative constraints for the crowd. This approach is embodied in Ensemble, a novel collaborative story-writing platform. In a month-long short story competition, over one hundred volunteer users on the web started over fifty short stories using Ensemble. Leaders used the platform to direct collaborator work by establishing creative goals, and collaborators contributed meaningful, high-level ideas to stories through specific suggestions. This work suggests that asymmetric creative contributions may support a broad new class of creative collaborations.

87 citations


Proceedings ArticleDOI
TL;DR: This paper integrates peer and machine grading to preserve the robustness of peer assessment and lower grading burden and provides an example of how peer work and machine learning can combine to improve the learning experience.
Abstract: Peer assessment helps students reflect and exposes them to different ideas. It scales assessment and allows large online classes to use open-ended assignments. However, it requires students to spend significant time grading. How can we lower this grading burden while maintaining quality? This paper integrates peer and machine grading to preserve the robustness of peer assessment and lower grading burden. In the identify-verify pattern, a grading algorithm first predicts a student grade and estimates confidence, which is used to estimate the number of peer raters required. Peers then identify key features of the answer using a rubric. Finally, other peers verify whether these feature labels were accurately applied. This pattern adjusts the number of peers that evaluate an answer based on algorithmic confidence and peer agreement. We evaluated this pattern with 1370 students in a large, online design class. With only 54% of the student grading time, the identify-verify pattern yields 80-90% of the accuracy obtained by taking the median of three peer scores, and provides more detailed feedback. A second experiment found that verification dramatically improves accuracy with more raters, with a 20% gain over the peer-median with four raters. However, verification also leads to lower initial trust in the grading system. The identify-verify pattern provides an example of how peer work and machine learning can combine to improve the learning experience.

81 citations


01 Jan 2014
TL;DR: HCI has a long history of studying not only the interaction between individuals with technology, but also the interaction of groups with or mediated by technology, and there are three main vectors of study for HCI and collective intelligence.
Abstract: The lessons of HCI can therefore be brought to bear on different aspects of collective intelligence. On the one hand, the people in the collective (the crowd) will only contribute if there are proper incentives and if the interface guides them in usable and meaningful ways. On the other, those interested in leveraging the collective need usable ways of coordinating, making sense of, and extracting value from the collective work that is being done, often on their behalf. Ultimately, collective intelligence involves the co-design of technical infrastructure and human-human interaction: a socio-technical system. In crowdsourcing, we might differentiate between two broad classes of users: requesters and crowd members. The requesters are the individuals or group for whom work is done or who takes the responsibility to aggregate the work done by the collective. The crowd member (or crowd worker) is one of many people to contribute. While we often use the word “worker,” crowd workers do not have need to be (and often aren’t) contributing as part of what we might consider standard “work.” They may work for pay or not, work for small periods of time or contribute for days to a project they care about, and they may work in such a way as each individual’s contribution may be difficult to discern from the collective final output. HCI has a long history of studying not only the interaction between individuals with technology, but also the interaction of groups with or mediated by technology. For example, computer-supported cooperative work (CSCW) investigates how to allow groups to accomplish tasks together using a shared or distributed computer interfaces, either at the same time or asynchronously. Current crowdsourcing research alters some of the standard assumptions about the size, composition, and stability of these groups, but the fundamental approaches remain the same. For instance, workers drawn from the crowd may be less reliable than groups of employees working on a shared task, and group membership in the crowd may change more quickly. There are three main vectors of study for HCI and collective intelligence. The first is directed crowdsourcing, where a single individual attempts to recruit and guide a large set of people to help accomplish a goal. The second is collaborative crowdsourcing, where a group gathers based on shared interest and self-determine their organization and work. The third vector is passive crowdsourcing, where the crowd or collective may never meet or coordinate, but it is still possible to mine their collective behavior patterns for information. We cover each vector in turn. We conclude with a list of challenges for researches in HCI related to crowdsourcing and collective intelligence.

63 citations


Proceedings ArticleDOI
26 Apr 2014
TL;DR: This work built Codex, a knowledge base that records common practice for the Ruby programming language by indexing over three million lines of popular code, and suggests that operationalizing practice-driven knowledge in structured domains such as programming can enable a new class of user interfaces.
Abstract: While emergent behaviors are uncodified across many domains such as programming and writing, interfaces need explicit rules to support users. We hypothesize that by codifying emergent programming behavior, software engineering interfaces can support a far broader set of developer needs. To explore this idea, we built Codex, a knowledge base that records common practice for the Ruby programming language by indexing over three million lines of popular code. Codex enables new data-driven interfaces for programming systems: statistical linting, identifying code that is unlikely to occur in practice and may constitute a bug; pattern annotation, automatically discovering common programming idioms and annotating them with metadata using expert crowdsourcing; and library generation, constructing a utility package that encapsulates and reflects emergent software practice. We evaluate these applications to find Codex captures a broad swatch of programming practice, statistical linting detects problematic code snippets, and pattern annotation discovers nontrivial idioms such as basic HTTP authentication and database migration templates. Our work suggests that operationalizing practice-driven knowledge in structured domains such as programming can enable a new class of user interfaces.

Proceedings ArticleDOI
15 Feb 2014
TL;DR: In a multi-month field deployment, Catalyst helped users organize events including food bank volunteering, on-demand study groups, and mass participation events like a human chess game, suggesting that activation thresholds can indeed catalyze a large class of new collective efforts.
Abstract: The web is a catalyst for drawing people together around shared goals, but many groups never reach critical mass. It can thus be risky to commit time or effort to a goal: participants show up only to discover that nobody else did, and organizers devote significant effort to causes that never get off the ground. Crowdfunding has lessened some of this risk by only calling in donations when an effort reaches a collective monetary goal. However, it leaves unsolved the harder problem of mobilizing effort, time and participation. We generalize the concept into activation thresholds, commitments that are conditioned on others' participation. With activation thresholds, supporters only need to show up for an event if enough other people commit as well. Catalyst is a platform that introduces activation thresholds for on-demand events. For more complex coordination needs, Catalyst also provides thresholds based on time or role (e.g., a bake sale requiring commitments for bakers, decorators, and sellers). In a multi-month field deployment, Catalyst helped users organize events including food bank volunteering, on-demand study groups, and mass participation events like a human chess game. Our results suggest that activation thresholds can indeed catalyze a large class of new collective efforts.

Proceedings ArticleDOI
TL;DR: It is suggested that synchronous peer interaction can benefit massive online courses as well because students in more geographically distributed groups also scored higher on the final, suggesting that distributed discussions have educational value.
Abstract: In the physical classroom, peer interactions motivate students and expand their perspective. We suggest that synchronous peer interaction can benefit massive online courses as well. Talkabout organizes students into video discussion groups and allows instructors to determine group composition and discussion content. Using Talkabout, students pick a discussion time that suits their schedule. The system groups the students into small video discussions based on instructor preferences such as gender or geographic balance. To date, 2,474 students in five massive online courses have used Talkabout to discuss topics ranging from prejudice to organizational theory. Talkabout discussions are diverse: in one course, the median six-person discussion group had students from four different countries. Students enjoyed discussing in these diverse groups: the average student participated for 66 minutes, twice the course requirement. Students in more geographically distributed groups also scored higher on the final, suggesting that distributed discussions have educational value.

Proceedings Article
05 Sep 2014
TL;DR: This work introduces context trees, a crowdsourcing workflow for creating global summaries of a large input, and introduces a weighting process that percolates ratings downwards through the tree so that important nodes in unimportant branches are not overweighted.
Abstract: Crowdsourcing struggles when workers must see all of the pieces of input to make an accurate judgment. For example, to find the most important scenes in a novel or movie, each worker must spend hours consuming the entire plot to acquire a global understanding and then apply that understanding to each local scene. To enable the crowdsourcing of large-scale goals with only local views, we introduce context trees, a crowdsourcing workflow for creating global summaries of a large input. Context trees recursively combine elements through written summaries to form a tree. Workers can then ground their local decisions by applying those summaries back down to the leaf nodes. In the case of scale ratings such as scene importance, we introduce a weighting process that percolates ratings downwards through the tree so that important nodes in unimportant branches are not overweighted. When using context trees to rate the importance of scenes in a 4000-word story and a 100-minute movie, workers’ ratings are nearly as accurate as those who saw the entire input, and much improved over the traditional approach of splitting the input into independent segments. To explore whether context trees enable crowdsourcing to undertake new classes of goals, we also crowdsource the solution to a large hierarchical puzzle of 462,000 interlocking pieces.

Proceedings ArticleDOI
05 Oct 2014
TL;DR: This paper investigates whether structured handoff methods, from one worker to the next, improve final product quality by helping the workers understand the input of their tasks and reduce overall integration cost, and concludes that structured handoffs result in higher quality work.
Abstract: Expert crowdsourcing allows specialized, remote teams to complete projects, often large and involving multiple stages. Its execution is complicated due to communication difficulties between remote workers. This paper investigates whether structured handoff methods, from one worker to the next, improve final product quality by helping the workers understand the input of their tasks and reduce overall integration cost. We investigate this question through 1) a "live" handoff method where the next worker shadows the former via screen sharing technology and 2) a "recorded" handoff, where workers summarize work done for the next, via a screen capture and narration. We confirm the need for a handoff process. We conclude that structured handoffs result in higher quality work, improved satisfaction (especially for workers with creative tasks), improved communication of non-obvious instructions, and increased adherence to the original intent of the project.