scispace - formally typeset
Open AccessProceedings Article

Identifying Personal Stories in Millions of Weblog Entries

TLDR
Efforts to develop a standard corpus for researchers in this area by identifying personal stories in the tens of millions of blog posts in the ICWSM 2009 Spinn3r Dataset are described.
Abstract
Stories of people's everyday experiences have long been the focus of psychology and sociology research, and are increasingly being used in innovative knowledge-based technologies. However, continued research in this area is hindered by the lack of standard corpora of sufficient size and by the costs of creating one from scratch. In this paper, we describe our efforts to develop a standard corpus for researchers in this area by identifying personal stories in the tens of millions of blog posts in the ICWSM 2009 Spinn3r Dataset. Our approach was to employ statistical text classification technology on the content of blog entries, which required the creation of a sufficiently large set of annotated training examples. We describe the development and evaluation of this classification technology and how it was applied to the dataset in order to identify nearly a million

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories

TL;DR: A new framework for evaluating story understanding and script learning: the `Story Cloze Test’, which requires a system to choose the correct ending to a four-sentence story, and a new corpus of 50k five- Sentence commonsense stories, ROCStories, to enable this evaluation.
Proceedings Article

Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning

TL;DR: The Choice Of Plausible Alternatives (COPA) evaluation as discussed by the authors uses a forced-choice format, where each question gives a premise and two plausible causes or effects, where the correct choice is the alternative that is more plausible than the other.
Proceedings Article

SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning

TL;DR: The two systems that competed in this task as part of SemEval-2012 are described, and their results are compared to those achieved in previously published research.
Posted Content

Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning.

TL;DR: This paper introduces Cosmos QA, a large-scale dataset of 35,600 problems that require commonsense-based reading comprehension, formulated as multiple-choice questions, and proposes a new architecture that improves over the competitive baselines.
Proceedings ArticleDOI

Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning

TL;DR: Cosmos QA as discussed by the authors ) is a large-scale dataset of 35,600 problems that require commonsense-based reading comprehension, formulated as multiple-choice questions, where the questions focus on reading between the lines, which in turn requires interpreting the likely causes and effects of events.
References
More filters
Journal ArticleDOI

The perceptron: a probabilistic model for information storage and organization in the brain.

TL;DR: This article will be concerned primarily with the second and third questions, which are still subject to a vast amount of speculation, and where the few relevant facts currently supplied by neurophysiology have not yet been integrated into an acceptable theory.
ReportDOI

Building a large annotated corpus of English: the penn treebank

TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.
Book

The perception: a probabilistic model for information storage and organization in the brain

F. Rosenblatt
TL;DR: The second and third questions are still subject to a vast amount of speculation, and where the few relevant facts currently supplied by neurophysiology have not yet been integrated into an acceptable theory as mentioned in this paper.
Proceedings ArticleDOI

Cheap and Fast -- But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks

TL;DR: This work explores the use of Amazon's Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web, and proposes a technique for bias correction that significantly improves annotation quality on two tasks.
Proceedings ArticleDOI

Confidence-weighted linear classification

TL;DR: Empirical evaluation on a range of NLP tasks show that the confidence-weighted linear classifiers introduced here improves over other state of the art online and batch methods, learns faster in the online setting, and lends itself to better classifier combination after parallel training.
Related Papers (5)