scispace - formally typeset
Open AccessProceedings Article

On the Development of the RST Spanish Treebank

Reads0
Chats0
TLDR
The RST Spanish Treebank is presented, the first corpus annotated with rhetorical relations for this language, and the interface that is developed to carry out searches over the corpus' annotated texts is shown.
Abstract
In this article we present the RST Spanish Treebank, the first corpus annotated with rhetorical relations for this language. We describe the characteristics of the corpus, the annotation criteria, the annotation procedure, the inter-annotator agreement, and other related aspects. Moreover, we show the interface that we have developed to carry out searches over the corpus' annotated texts.

read more

Citations
More filters

CSTNews - A Discourse-Annotated Corpus for Single and Multi-Document Summarization of News Texts in Brazilian Portuguese

TL;DR: CSTNews, a discourse-annotated corpus for fostering research on single and multi-document summarization, is introduced within the context of the SUCINTO Project, which aims at investigating summarization strategies and developing tools and resources for that purpose.
Proceedings ArticleDOI

Cross-lingual RST Discourse Parsing

TL;DR: A new discourse parser which is simpler, yet competitive (significantly better on 2/3 metrics) to state of the art for English, and a harmonization of discourse treebanks across languages are presented, enabling the first experiments on cross-lingual discourse parsing to be presented.
Journal ArticleDOI

A qualitative comparison method for rhetorical structures: identifying different discourse structures in multilingual corpora

TL;DR: A new type of comparison is shown that has important advantages with regard to the quantitative method usually employed: it provides an accurate measurement of inter-annotator agreement, and it pinpoints sources of disagreement among annotators.
Proceedings Article

Discourse Structure and Computation: Past, Present and Future

TL;DR: The challenges faced by the current understanding of discourse, the technology they employ, the applications they support, and the applications that meeting these challenges will promote are recounted.
Proceedings ArticleDOI

Cross-lingual and cross-domain discourse segmentation of entire documents

TL;DR: This article proposed statistical discourse segmenters for five languages and three domains that do not rely on gold pre-annotations, and achieved 89.5% F1 for English newswire, with slight drops in performance on other domains.
References
More filters
Journal ArticleDOI

A Coefficient of agreement for nominal Scales

TL;DR: In this article, the authors present a procedure for having two or more judges independently categorize a sample of units and determine the degree, significance, and significance of the units. But they do not discuss the extent to which these judgments are reproducible, i.e., reliable.
ReportDOI

Building a large annotated corpus of English: the penn treebank

TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.
Journal ArticleDOI

Rhetorical Structure Theory : Toward a Functional Theory of Text Organization

TL;DR: Rhetorical Structure Theory (RST) as mentioned in this paper is a descriptive theory of a major aspect of the organization of natural text, which is a linguistically useful method for describing natural texts, characterizing their Structure primarily in terms of relations that hold between parts of the text.
Journal ArticleDOI

Inter-coder agreement for computational linguistics

TL;DR: It is argued that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in computational linguistics, may be more appropriate for many corpus annotation tasks—but that their use makes the interpretation of the value of the coefficient even harder.