scispace - formally typeset
Open AccessProceedings Article

Anaphoric Annotation of Wikipedia and Blogs in the Live Memories Corpus

Reads0
Chats0
TLDR
The Live Memories corpus is an Italian corpus annotated for anaphoric relations that contains texts from the Italian Wikipedia about the region Trentino/Sud Tirol and from blog sites with users' comments.
Abstract
The Live Memories corpus is an Italian corpus annotated for anaphoric relations. This annotation effort aims to contribute to two significant issues for the CL research: the lack of annotated anaphoric resources for Italian and the increasing interest for the social Web. The Live Memories Corpus contains texts from the Italian Wikipedia about the region Trentino/Sud Tirol and from blog sites with users' comments. It is planned to add a set of articles of local news papers. The corpus includes manual annotated information about morphosyntactic agreement, anaphoricity, and semantic class of the NPs. The anaphoric annotation includes discourse deixis, bridging relations and markes cases of ambiguity with the annotation of alternative interpretations. For the annotation of the anaphoric links the corpus takes into account specific phenomena of the Italian language like incorporated clitics and phonetically non realized pronouns. Reliability studies for the annotation of the mentioned phenomena and for annotation of anaphoric links in general offer satisfactory results. The Wikipedia and blogs dataset will be distributed under Creative Commons Attributions licence.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article

SemEval-2010 Task 1: Coreference Resolution in Multiple Languages

TL;DR: The SemEval-2010 task on coreference resolution in multiple languages as mentioned in this paper evaluated and compared automatic resolution systems for six different languages (Catalan, Dutch, English, German, Italian and Spanish) in four evaluation settings and using four different metrics.
Proceedings Article

WikiCoref: An English Coreference-annotated Corpus of Wikipedia Articles

TL;DR: The WikiCoref corpus as discussed by the authors is an English corpus annotated for anaphoric relations, where all documents are from the English version of Wikipedia and annotated each markable with coreference type, mention type and the equivalent Freebase topic.
Proceedings Article

A Cross-Lingual ILP Solution to Zero Anaphora Resolution

TL;DR: An ILP-based model of zero anaphora detection and resolution that builds on the joint determination of anaphoricity and coreference model proposed by Denis and Baldridge (2007), but revises it and extends it into a three-way ILP problem also incorporating subject detection.
Journal ArticleDOI

Annotating a broad range of anaphoric phenomena, in a variety of genres: the ARRAU Corpus

TL;DR: All the distinguishing features of the corpus are discussed, so far only partially presented in a number of conference and workshop papers, and the development between the first release of arrau in 2008 and this second one is discussed.
Proceedings Article

Introducing the Prague Discourse Treebank 1.0

TL;DR: The theoretical background is presented, the annotation was performed directly on top of syntactic trees (from the previous project of the Prague Dependency Treebank 2.5), benefiting thus from the linguistic information already existing on the same data.
References
More filters
Book

Opinion Mining and Sentiment Analysis

TL;DR: This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems and focuses on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis.
Journal Article

Assessing agreement on classification tasks: the kappa statistic

TL;DR: The authors discuss what is wrong with reliability measures as they are currently used for discourse and dialogue work in computational linguistics and cognitive science, and argue that we would be better off as a field adopting techniques from content analysis.
Journal ArticleDOI

MaltParser: A language-independent system for data-driven dependency parsing

TL;DR: Experimental evaluation confirms that MaltParser can achieve robust, efficient and accurate parsing for a wide range of languages without language-specific enhancements and with rather limited amounts of training data.
Journal ArticleDOI

A Method of Automated Nonparametric Content Analysis for Social Science

TL;DR: This work develops a method that gives approximately unbiased estimates of category proportions even when the optimal classifier performs poorly, and illustrates with diverse data sets, including the daily expressed opinions of thousands of people about the U.S. presidency.
Proceedings Article

Using Wikipedia for Automatic Word Sense Disambiguation

TL;DR: A method for generating sense-tagged data using Wikipedia as a source of sense annotations and showing that the Wikipedia-based sense annotations are reliable and can be used to construct accurate sense classifiers is described.
Related Papers (5)