scispace - formally typeset
Search or ask a question

Showing papers by "Michael S. Bernstein published in 2016"


Posted Content
TL;DR: The Visual Genome dataset is presented, which contains over 108K images where each image has an average of $$35$$35 objects, $$26$$26 attributes, and $$21$$21 pairwise relationships between objects, and represents the densest and largest dataset of image descriptions, objects, attributes, relationships, and question answer pairs.
Abstract: Despite progress in perceptual tasks such as image classification, computers still perform poorly on cognitive tasks such as image description and question answering. Cognition is core to tasks that involve not just recognizing, but reasoning about our visual world. However, models used to tackle the rich content in images for cognitive tasks are still being trained using the same datasets designed for perceptual tasks. To achieve success at cognitive tasks, models need to understand the interactions and relationships between objects in an image. When asked "What vehicle is the person riding?", computers will need to identify the objects in an image as well as the relationships riding(man, carriage) and pulling(horse, carriage) in order to answer correctly that "the person is riding a horse-drawn carriage". In this paper, we present the Visual Genome dataset to enable the modeling of such relationships. We collect dense annotations of objects, attributes, and relationships within each image to learn these models. Specifically, our dataset contains over 100K images where each image has an average of 21 objects, 18 attributes, and 18 pairwise relationships between objects. We canonicalize the objects, attributes, relationships, and noun phrases in region descriptions and questions answer pairs to WordNet synsets. Together, these annotations represent the densest and largest dataset of image descriptions, objects, attributes, relationships, and question answers.

1,663 citations


Book ChapterDOI
08 Oct 2016
TL;DR: In this article, the authors propose a model that uses this insight to train visual models for objects and predicates individually and later combines them together to predict multiple relationships per image and localize the objects in the predicted relationships as bounding boxes in the image.
Abstract: Visual relationships capture a wide variety of interactions between pairs of objects in images (e.g. “man riding bicycle” and “man pushing bicycle”). Consequently, the set of possible relationships is extremely large and it is difficult to obtain sufficient training examples for all possible relationships. Because of this limitation, previous work on visual relationship detection has concentrated on predicting only a handful of relationships. Though most relationships are infrequent, their objects (e.g. “man” and “bicycle”) and predicates (e.g. “riding” and “pushing”) independently occur more frequently. We propose a model that uses this insight to train visual models for objects and predicates individually and later combines them together to predict multiple relationships per image. We improve on prior work by leveraging language priors from semantic word embeddings to finetune the likelihood of a predicted relationship. Our model can scale to predict thousands of types of relationships from a few examples. Additionally, we localize the objects in the predicted relationships as bounding boxes in the image. We further demonstrate that understanding relationships can improve content based image retrieval.

893 citations


Proceedings ArticleDOI
01 Jun 2016
TL;DR: In this article, an LSTM model with spatial attention was proposed to tackle the 7W QA task, which enables a new type of QA with visual answers, in addition to textual answers used in previous work.
Abstract: We have seen great progress in basic perceptual tasks such as object recognition and detection. However, AI models still fail to match humans in high-level vision tasks due to the lack of capacities for deeper reasoning. Recently the new task of visual question answering (QA) has been proposed to evaluate a model's capacity for deep image understanding. Previous works have established a loose, global association between QA sentences and images. However, many questions and answers, in practice, relate to local regions in the images. We establish a semantic link between textual descriptions and image regions by object-level grounding. It enables a new type of QA with visual answers, in addition to textual answers used in previous work. We study the visual QA tasks in a grounded setting with a large collection of 7W multiple-choice QA pairs. Furthermore, we evaluate human performance and several baseline models on the QA tasks. Finally, we propose a novel LSTM model with spatial attention to tackle the 7W QA tasks.

751 citations


Posted Content
TL;DR: This work proposes a model that can scale to predict thousands of types of relationships from a few examples and improves on prior work by leveraging language priors from semantic word embeddings to finetune the likelihood of a predicted relationship.
Abstract: Visual relationships capture a wide variety of interactions between pairs of objects in images (e.g. "man riding bicycle" and "man pushing bicycle"). Consequently, the set of possible relationships is extremely large and it is difficult to obtain sufficient training examples for all possible relationships. Because of this limitation, previous work on visual relationship detection has concentrated on predicting only a handful of relationships. Though most relationships are infrequent, their objects (e.g. "man" and "bicycle") and predicates (e.g. "riding" and "pushing") independently occur more frequently. We propose a model that uses this insight to train visual models for objects and predicates individually and later combines them together to predict multiple relationships per image. We improve on prior work by leveraging language priors from semantic word embeddings to finetune the likelihood of a predicted relationship. Our model can scale to predict thousands of types of relationships from a few examples. Additionally, we localize the objects in the predicted relationships as bounding boxes in the image. We further demonstrate that understanding relationships can improve content based image retrieval.

517 citations


Proceedings ArticleDOI
07 May 2016
TL;DR: Empath is a tool that can generate and validate new lexical categories on demand from a small set of seed terms, which draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction.
Abstract: Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence). Empath draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction. Given a small set of seed words that characterize a category, Empath uses its neural embedding to discover new related terms, then validates the category with a crowd-powered filter. Empath also analyzes text across 200 built-in, pre-validated categories we have generated from common topics in our web dataset, like neglect, government, and social media. We show that Empath's data-driven, human validated categories are highly correlated (r=0.906) with similar categories in LIWC.

235 citations


Proceedings ArticleDOI
TL;DR: Emppath as mentioned in this paper is a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence).
Abstract: Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence). Empath draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction. Given a small set of seed words that characterize a category, Empath uses its neural embedding to discover new related terms, then validates the category with a crowd-powered filter. Empath also analyzes text across 200 built-in, pre-validated categories we have generated from common topics in our web dataset, like neglect, government, and social media. We show that Empath's data-driven, human validated categories are highly correlated (r=0.906) with similar categories in LIWC.

227 citations


Proceedings ArticleDOI
07 May 2016
TL;DR: This work presents a technique that produces extremely rapid judgments for binary and categorical labels, and demonstrates that it is possible to rectify errors by randomizing task order and modeling response latency.
Abstract: Microtask crowdsourcing has enabled dataset advances in social science and machine learning, but existing crowdsourcing schemes are too expensive to scale up with the expanding volume of data. To scale and widen the applicability of crowdsourcing, we present a technique that produces extremely rapid judgments for binary and categorical labels. Rather than punishing all errors, which causes workers to proceed slowly and deliberately, our technique speeds up workers' judgments to the point where errors are acceptable and even expected. We demonstrate that it is possible to rectify these errors by randomizing task order and modeling response latency. We evaluate our technique on a breadth of common labeling tasks such as image verification, word similarity, sentiment analysis and topic classification. Where prior work typically achieves a 0.25x to 1x speedup over fixed majority vote, our approach often achieves an order of magnitude (10x) speedup.

95 citations


Proceedings ArticleDOI
07 May 2016
TL;DR: Atelier, a micro-internship platform that connects crowd interns with crowd mentors, guides mentor-intern pairs to break down expert crowdsourcing tasks into milestones, review intermediate output, and problem-solve together, finding that Atelier helped interns maintain forward progress and absorb best practices.
Abstract: Expert crowdsourcing marketplaces have untapped potential to empower workers' career and skill development. Currently, many workers cannot afford to invest the time and sacrifice the earnings required to learn a new skill, and a lack of experience makes it difficult to get job offers even if they do. In this paper, we seek to lower the threshold to skill development by repurposing existing tasks on the marketplace as mentored, paid, real-world work experiences, which we refer to as micro-internships. We instantiate this idea in Atelier, a micro-internship platform that connects crowd interns with crowd mentors. Atelier guides mentor-intern pairs to break down expert crowdsourcing tasks into milestones, review intermediate output, and problem-solve together. We conducted a field experiment comparing Atelier's mentorship model to a non-mentored alternative on a real-world programming crowdsourcing task, finding that Atelier helped interns maintain forward progress and absorb best practices.

69 citations


Proceedings ArticleDOI
16 Oct 2016
TL;DR: Inspired by a game-theoretic notion of incentive-compatibility, Boomerang opens opportunities for interaction design to incentivize honest reporting over strategic dishonesty.
Abstract: Paid crowdsourcing platforms suffer from low-quality work and unfair rejections, but paradoxically, most workers and requesters have high reputation scores. These inflated scores, which make high-quality work and workers difficult to find, stem from social pressure to avoid giving negative feedback. We introduce Boomerang, a reputation system for crowdsourcing platforms that elicits more accurate feedback by rebounding the consequences of feedback directly back onto the person who gave it. With Boomerang, requesters find that their highly-rated workers gain earliest access to their future tasks, and workers find tasks from their highly-rated requesters at the top of their task feed. Field experiments verify that Boomerang causes both workers and requesters to provide feedback that is more closely aligned with their private opinions. Inspired by a game-theoretic notion of incentive-compatibility, Boomerang opens opportunities for interaction design to incentivize honest reporting over strategic dishonesty.

50 citations


Proceedings ArticleDOI
TL;DR: This work proposes a technique for achieving interdependent complex goals with crowds, and embodies it in Mechanical Novel, a system that crowdsources short fiction stories on Amazon Mechanical Turk.
Abstract: Crowdsourcing systems accomplish large tasks with scale and speed by breaking work down into independent parts. However, many types of complex creative work, such as fiction writing, have remained out of reach for crowds because work is tightly interdependent: changing one part of a story may trigger changes to the overall plot and vice versa. Taking inspiration from how expert authors write, we propose a technique for achieving interdependent complex goals with crowds. With this technique, the crowd loops between reflection, to select a high-level goal, and revision, to decompose that goal into low-level, actionable tasks. We embody this approach in Mechanical Novel, a system that crowdsources short fiction stories on Amazon Mechanical Turk. In a field experiment, Mechanical Novel resulted in higher-quality stories than an iterative crowdsourcing workflow. Our findings suggest that orienting crowd work around high-level goals may enable workers to coordinate their effort to accomplish complex work.

42 citations


Proceedings Article
29 Mar 2016
TL;DR: A technique that combines natural language processing with a crowdsourced lexicon of stereotypes to capture gender biases in fiction finds that male over-representation and traditional gender stereotypes are common throughout nearly every genre in the corpus.
Abstract: Imagine a princess asleep in a castle, waiting for her prince to slay the dragon and rescue her. Tales like the famous Sleeping Beauty clearly divide up gender roles. But what about more modern stories, borne of a generation increasingly aware of social constructs like sexism and racism? Do these stories tend to reinforce gender stereotypes, or counter them? In this paper, we present a technique that combines natural language processing with a crowdsourced lexicon of stereotypes to capture gender biases in fiction. We apply this technique across 1.8 billion words of fiction from the Wattpad online writing community, investigating gender representation in stories, how male and female characters behave and are described, and how authors' use of gender stereotypes is associated with the community's ratings. We find that male over-representation and traditional gender stereotypes (e.g., dominant men and submissive women) are common throughout nearly every genre in our corpus. However, only some of these stereotypes, like sexual or violent men, are associated with highly rated stories. Finally, despite women often being the target of negative stereotypes, female authors are equally likely to write such stereotypes as men.

Posted Content
TL;DR: Huddler as discussed by the authors enables effective crowd teams with a system for workers to assemble familiar teams even under unpredictable availability and strict time constraints, utilizing a dynamic programming algorithm to optimize for highly familiar teammates when individual availability is unknown.
Abstract: Distributed, parallel crowd workers can accomplish simple tasks through workflows, but teams of collaborating crowd workers are necessary for complex goals. Unfortunately, a fundamental condition for effective teams - familiarity with other members - stands in contrast to crowd work's flexible, on-demand nature. We enable effective crowd teams with Huddler, a system for workers to assemble familiar teams even under unpredictable availability and strict time constraints. Huddler utilizes a dynamic programming algorithm to optimize for highly familiar teammates when individual availability is unknown. We first present a field experiment that demonstrates the value of familiarity for crowd teams: familiar crowd teams doubled the performance of ad-hoc (unfamiliar) teams on a collaborative task. We then report a two-week field deployment wherein Huddler enabled crowd workers to convene highly familiar teams in 18 minutes on average. This research advances the goal of supporting long-term, team-based collaborations without sacrificing the flexibility of crowd work.

Proceedings ArticleDOI
TL;DR: Through Mosaic, it is argued that communities oriented around sharing creative process can create a collaborative environment that is beneficial for creative growth.
Abstract: Online creative communities allow creators to share their work with a large audience, maximizing opportunities to showcase their work and connect with fans and peers. However, sharing in-progress work can be technically and socially challenging in environments designed for sharing completed pieces. We propose an online creative community where sharing process, rather than showcasing outcomes, is the main method of sharing creative work. Based on this, we present Mosaic---an online community where illustrators share work-in-progress snapshots showing how an artwork was completed from start to finish. In an online deployment and observational study, artists used Mosaic as a vehicle for reflecting on how they can improve their own creative process, developed a social norm of detailed feedback, and became less apprehensive of sharing early versions of artwork. Through Mosaic, we argue that communities oriented around sharing creative process can create a collaborative environment that is beneficial for creative growth.

Proceedings ArticleDOI
07 May 2016
TL;DR: Is the pay-per-task method the right one?
Abstract: Paid crowdsourcing marketplaces have gained popularity by using piecework, or payment for each microtask, to incentivize workers. This norm has remained relatively unchallenged. In this paper, we ask: is the pay-per-task method the right one? We draw on behavioral economic research to examine whether payment in bulk after every ten tasks, saving money via coupons instead of earning money, or material goods rather than money will increase the number of completed tasks. We perform a twenty-day, between-subjects field experiment (N=300) on a mobile crowdsourcing application and measure how often workers responded to a task notification to fill out a short survey under each incentive condition. Task completion rates increased when paying in bulk after ten tasks: doing so increased the odds of a response by 1.4x, translating into 8% more tasks through that single intervention. Payment with coupons instead of money produced a small negative effect on task completion rates. Material goods were the most robust to decreasing participation over time.

Proceedings ArticleDOI
07 May 2016
TL;DR: Augur as discussed by the authors mines a knowledge base of human behavior by analyzing more than one billion words of modern fiction and trains vector models that can predict many thousands of user activities from surrounding objects in modern contexts, such as eating, meeting with a friend, or taking a selfie.
Abstract: From smart homes that prepare coffee when we wake, to phones that know not to interrupt us during important conversations, our collective visions of HCI imagine a future in which computers understand a broad range of human behaviors. Today our systems fall short of these visions, however, because this range of behaviors is too large for designers or programmers to capture manually. In this paper, we instead demonstrate it is possible to mine a broad knowledge base of human behavior by analyzing more than one billion words of modern fiction. Our resulting knowledge base, Augur, trains vector models that can predict many thousands of user activities from surrounding objects in modern contexts: for example, whether a user may be eating food, meeting with a friend, or taking a selfie. Augur uses these predictions to identify actions that people commonly take on objects in the world and estimate a user's future activities given their current situation. We demonstrate Augur-powered, activity-based systems such as a phone that silences itself when the odds of you answering it are low, and a dynamic music player that adjusts to your present activity. A field deployment of an Augur-powered wearable camera resulted in 96% recall and 71% precision on its unsupervised predictions of common daily activities. A second evaluation where human judges rated the system's predictions over a broad set of input images found that 94% were rated sensible.

Proceedings ArticleDOI
TL;DR: Augur-powered, activity-based systems such as a phone that silences itself when the odds of you answering it are low, and a dynamic music player that adjusts to your present activity are demonstrated.
Abstract: From smart homes that prepare coffee when we wake, to phones that know not to interrupt us during important conversations, our collective visions of HCI imagine a future in which computers understand a broad range of human behaviors. Today our systems fall short of these visions, however, because this range of behaviors is too large for designers or programmers to capture manually. In this paper, we instead demonstrate it is possible to mine a broad knowledge base of human behavior by analyzing more than one billion words of modern fiction. Our resulting knowledge base, Augur, trains vector models that can predict many thousands of user activities from surrounding objects in modern contexts: for example, whether a user may be eating food, meeting with a friend, or taking a selfie. Augur uses these predictions to identify actions that people commonly take on objects in the world and estimate a user's future activities given their current situation. We demonstrate Augur-powered, activity-based systems such as a phone that silences itself when the odds of you answering it are low, and a dynamic music player that adjusts to your present activity. A field deployment of an Augur-powered wearable camera resulted in 96% recall and 71% precision on its unsupervised predictions of common daily activities. A second evaluation where human judges rated the system's predictions over a broad set of input images found that 94% were rated sensible.

Proceedings ArticleDOI
07 May 2016
TL;DR: This workshop brings together researchers in task decomposition, completion, and sourcing to discuss how intersections of research across these areas can pave the path for future research in this space.
Abstract: It is difficult to accomplish meaningful goals with limited time and attentional resources. However, recent research has shown that concrete plans with actionable steps allow people to complete tasks better and faster. With advances in techniques that can decompose larger tasks into smaller units, we envision that a transformation from larger tasks to smaller microtasks will impact when and how people perform complex information work, enabling efficient and easy completion of tasks that currently seem challenging. In this workshop, we bring together researchers in task decomposition, completion, and sourcing. We will pursue a broad understanding of the challenges in creating, allocating, and scheduling microtasks, as well as how accomplishing these microtasks can contribute towards productivity. The goal is to discuss how intersections of research across these areas can pave the path for future research in this space.

Proceedings ArticleDOI
TL;DR: In this article, the authors present a technique that produces extremely rapid judgments for binary and categorical labels, rather than punishing all errors, which causes workers to proceed slowly and deliberately, and demonstrate that it is possible to rectify these errors by randomizing task order and modeling response latency.
Abstract: Microtask crowdsourcing has enabled dataset advances in social science and machine learning, but existing crowdsourcing schemes are too expensive to scale up with the expanding volume of data. To scale and widen the applicability of crowdsourcing, we present a technique that produces extremely rapid judgments for binary and categorical labels. Rather than punishing all errors, which causes workers to proceed slowly and deliberately, our technique speeds up workers' judgments to the point where errors are acceptable and even expected. We demonstrate that it is possible to rectify these errors by randomizing task order and modeling response latency. We evaluate our technique on a breadth of common labeling tasks such as image verification, word similarity, sentiment analysis and topic classification. Where prior work typically achieves a 0.25x to 1x speedup over fixed majority vote, our approach often achieves an order of magnitude (10x) speedup.

Proceedings ArticleDOI
TL;DR: Drawing inspiration from historical worker guilds is drawn to design and implement crowd guilds: centralized groups of crowd workers who collectively certify each other's quality through double-blind peer assessment.
Abstract: Crowd workers are distributed and decentralized. While decentralization is designed to utilize independent judgment to promote high-quality results, it paradoxically undercuts behaviors and institutions that are critical to high-quality work. Reputation is one central example: crowdsourcing systems depend on reputation scores from decentralized workers and requesters, but these scores are notoriously inflated and uninformative. In this paper, we draw inspiration from historical worker guilds (e.g., in the silk trade) to design and implement crowd guilds: centralized groups of crowd workers who collectively certify each other's quality through double-blind peer assessment. A two-week field experiment compared crowd guilds to a traditional decentralized crowd work model. Crowd guilds produced reputation signals more strongly correlated with ground-truth worker quality than signals available on current crowd working platforms, and more accurate than in the traditional model.

Posted Content
TL;DR: In this article, a micro-internship platform that connects crowd interns with crowd mentors is proposed. But it does not address the issue that many workers cannot invest the time and sacrifice the earnings required to learn a new skill, and a lack of experience makes it difficult to get job offers even if they do.
Abstract: Expert crowdsourcing marketplaces have untapped potential to empower workers' career and skill development. Currently, many workers cannot afford to invest the time and sacrifice the earnings required to learn a new skill, and a lack of experience makes it difficult to get job offers even if they do. In this paper, we seek to lower the threshold to skill development by repurposing existing tasks on the marketplace as mentored, paid, real-world work experiences, which we refer to as micro-internships. We instantiate this idea in Atelier, a micro-internship platform that connects crowd interns with crowd mentors. Atelier guides mentor-intern pairs to break down expert crowdsourcing tasks into milestones, review intermediate output, and problem-solve together. We conducted a field experiment comparing Atelier's mentorship model to a non-mentored alternative on a real-world programming crowdsourcing task, finding that Atelier helped interns maintain forward progress and absorb best practices.

Proceedings ArticleDOI
16 Oct 2016
TL;DR: Meta: a language extension for Python that allows programmers to share functions and track how they are used by a crowd of other programmers is introduced, finding that professional programmers are able to use Meta for complex tasks, and that Meta is able to find 44 optimizations and 5 bug fixes across the crowd.
Abstract: Collectively authored programming resources such as Q&A sites and open-source libraries provide a limited window into how programs are constructed, debugged, and run. To address these limitations, we introduce Meta: a language extension for Python that allows programmers to share functions and track how they are used by a crowd of other programmers. Meta functions are shareable via URL and instrumented to record runtime data. Combining thousands of Meta functions with their collective runtime data, we demonstrate tools including an optimizer that replaces your function with a more efficient version written by someone else, an auto-patcher that saves your program from crashing by finding equivalent functions in the community, and a proactive linter that warns you when a function fails elsewhere in the community. We find that professional programmers are able to use Meta for complex tasks (creating new Meta functions that, for example, cross-validate a logistic regression), and that Meta is able to find 44 optimizations (for a 1.45 times average speedup) and 5 bug fixes across the crowd.

Proceedings ArticleDOI
Abstract: Microtask crowdsourcing is increasingly critical to the creation of extremely large datasets. As a result, crowd workers spend weeks or months repeating the exact same tasks, making it necessary to understand their behavior over these long periods of time. We utilize three large, longitudinal datasets of nine million annotations collected from Amazon Mechanical Turk to examine claims that workers fatigue or satisfice over these long periods, producing lower quality work. We find that, contrary to these claims, workers are extremely stable in their quality over the entire period. To understand whether workers set their quality based on the task's requirements for acceptance, we then perform an experiment where we vary the required quality for a large crowdsourcing task. Workers did not adjust their quality based on the acceptance threshold: workers who were above the threshold continued working at their usual quality level, and workers below the threshold self-selected themselves out of the task. Capitalizing on this consistency, we demonstrate that it is possible to predict workers' long-term quality using just a glimpse of their quality on the first five tasks.

Posted Content
TL;DR: This paper used natural language processing with a crowdsourced lexicon of stereotypes to capture gender biases in fiction and found that male overrepresentation and traditional gender stereotypes are common throughout nearly every genre in the corpus.
Abstract: Imagine a princess asleep in a castle, waiting for her prince to slay the dragon and rescue her. Tales like the famous Sleeping Beauty clearly divide up gender roles. But what about more modern stories, borne of a generation increasingly aware of social constructs like sexism and racism? Do these stories tend to reinforce gender stereotypes, or counter them? In this paper, we present a technique that combines natural language processing with a crowdsourced lexicon of stereotypes to capture gender biases in fiction. We apply this technique across 1.8 billion words of fiction from the Wattpad online writing community, investigating gender representation in stories, how male and female characters behave and are described, and how authors' use of gender stereotypes is associated with the community's ratings. We find that male over-representation and traditional gender stereotypes (e.g., dominant men and submissive women) are common throughout nearly every genre in our corpus. However, only some of these stereotypes, like sexual or violent men, are associated with highly rated stories. Finally, despite women often being the target of negative stereotypes, female authors are equally likely to write such stereotypes as men.

Book ChapterDOI
01 Jan 2016
TL;DR: This research demonstrates how large classes can leverage their scale to encourage mastery through rapid feedback and revision, and suggests secret ingredients to make such peer interactions sustainable at scale.
Abstract: When students work with peers, they learn more actively, build richer knowledge structures, and connect material to their lives. However, not every peer learning experience online sees successful adoption. This chapter first introduces PeerStudio, an assessment platform that leverages the large number of students’ peers in online classes to enable rapid feedback on in-progress work. Students submit their draft, give rubric-based feedback on two peers’ drafts, and then receive peer feedback. Students can integrate the feedback and repeat this process as often as they desire. PeerStudio demonstrates how rapid feedback on in-progress work improves course outcomes. We then articulate and address three adoption and implementation challenges for peer learning platforms such as PeerStudio. First, peer interactions struggle to bootstrap critical mass. However, class incentives can signal importance and spur initial usage. Second, online classes have limited peer visibility and awareness, so students often feel alone even when surrounded by peers. We find that highlighting interdependence and strengthening norms can mitigate this issue. Third, teachers can readily access “big” aggregate data but not “thick” contextual data that helps build intuitions, so software should guide teachers’ scaffolding of peer interactions. We illustrate these challenges through studying 8500 students’ usage of PeerStudio and another peer learning platform: Talkabout. Efficacy is measured through sign-up and participation rates and the structure and duration of student interactions. This research demonstrates how large classes can leverage their scale to encourage mastery through rapid feedback and revision, and suggests secret ingredients to make such peer interactions sustainable at scale.

Proceedings ArticleDOI
07 May 2016
TL;DR: It is found that rare experiences are inflated on the web (by a median of 7x), while common experiences are deflated (byA median of 0.7x).
Abstract: People populate the web with content relevant to their lives, content that millions of others rely on for information and guidance. However, the web is not a perfect rep- resentation of lived experience: some topics appear in greater proportion online than their true incidence in our population, while others are deflated. This paper presents a large scale data collection study of this phenomenon. We collect webpages about 21 topics of interest capturing roughly 200,000 webpages, and then compare each topic's popularity to representative national surveys as ground truth. We find that rare experiences are inflated on the web (by a median of 7x), while common experiences are deflated (by a median of 0.7x). We call this phenomenon novelty bias.