Identifying Personal Stories in Millions of Weblog Entries

Open AccessProceedings Article

Identifying Personal Stories in Millions of Weblog Entries

TLDR

Efforts to develop a standard corpus for researchers in this area by identifying personal stories in the tens of millions of blog posts in the ICWSM 2009 Spinn3r Dataset are described.

Abstract:

Stories of people's everyday experiences have long been the focus of psychology and sociology research, and are increasingly being used in innovative knowledge-based technologies. However, continued research in this area is hindered by the lack of standard corpora of sufficient size and by the costs of creating one from scratch. In this paper, we describe our efforts to develop a standard corpus for researchers in this area by identifying personal stories in the tens of millions of blog posts in the ICWSM 2009 Spinn3r Dataset. Our approach was to employ statistical text classification technology on the content of blog entries, which required the creation of a sufficiently large set of annotated training examples. We describe the development and evaluation of this classification technology and how it was applied to the dataset in order to identify nearly a million

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories

Nasrin Mostafazadeh, +7 more

TL;DR: A new framework for evaluating story understanding and script learning: the `Story Cloze Test’, which requires a system to choose the correct ending to a four-sentence story, and a new corpus of 50k five- Sentence commonsense stories, ROCStories, to enable this evaluation.

...read moreread less

Proceedings Article

Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning

Melissa Roemmele, +2 more

TL;DR: The Choice Of Plausible Alternatives (COPA) evaluation as discussed by the authors uses a forced-choice format, where each question gives a premise and two plausible causes or effects, where the correct choice is the alternative that is more plausible than the other.

...read moreread less

Proceedings Article

SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning

Andrew S. Gordon, +2 more

TL;DR: The two systems that competed in this task as part of SemEval-2012 are described, and their results are compared to those achieved in previously published research.

...read moreread less

Posted Content

Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning.

Lifu Huang, +3 more

- 31 Aug 2019 -

arXiv: Computation and Language

TL;DR: This paper introduces Cosmos QA, a large-scale dataset of 35,600 problems that require commonsense-based reading comprehension, formulated as multiple-choice questions, and proposes a new architecture that improves over the competitive baselines.

...read moreread less

Proceedings ArticleDOI

Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning

Lifu Huang, +3 more

TL;DR: Cosmos QA as discussed by the authors ) is a large-scale dataset of 35,600 problems that require commonsense-based reading comprehension, formulated as multiple-choice questions, where the questions focus on reading between the lines, which in turn requires interpreting the likely causes and effects of events.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

The perceptron: a probabilistic model for information storage and organization in the brain.

Frank Rosenblatt

- 01 Nov 1958 -

Psychological Review

TL;DR: This article will be concerned primarily with the second and third questions, which are still subject to a vast amount of speculation, and where the few relevant facts currently supplied by neurophysiology have not yet been integrated into an acceptable theory.

...read moreread less

ReportDOI

Building a large annotated corpus of English: the penn treebank

Mitchell Marcus, +2 more

- 01 Jun 1993 -

Computational Linguistics

TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.

...read moreread less

Book

The perception: a probabilistic model for information storage and organization in the brain

F. Rosenblatt

TL;DR: The second and third questions are still subject to a vast amount of speculation, and where the few relevant facts currently supplied by neurophysiology have not yet been integrated into an acceptable theory as mentioned in this paper.

...read moreread less

Proceedings ArticleDOI

Cheap and Fast -- But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks

Rion Snow, +3 more

TL;DR: This work explores the use of Amazon's Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web, and proposes a technique for bias correction that significantly improves annotation quality on two tasks.

...read moreread less

Proceedings ArticleDOI

Confidence-weighted linear classification

Mark Dredze, +2 more

TL;DR: Empirical evaluation on a range of NLP tasks show that the confidence-weighted linear classifiers introduced here improves over other state of the art online and batch methods, learns faster in the online setting, and lends itself to better classifier combination after parallel training.

...read moreread less

Journal of Narrative and Life History

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003 -

Journal of Machine Learning Research

Identifying Personal Stories in Millions of Weblog Entries

Citations

A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories

Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning

SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning

Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning.

Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning

References

The perceptron: a probabilistic model for information storage and organization in the brain.

Building a large annotated corpus of English: the penn treebank

The perception: a probabilistic model for information storage and organization in the brain

Cheap and Fast -- But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks

Confidence-weighted linear classification

Related Papers (5)

Unsupervised Learning of Narrative Event Chains

The Stanford CoreNLP Natural Language Processing Toolkit

Unsupervised Learning of Narrative Schemas and their Participants

Narrative analysis: Oral versions of personal experience.

Latent dirichlet allocation