Open AccessProceedings Article
On the Development of the RST Spanish Treebank
Iria da Cunha,Juan-Manuel Torres-Moreno,Gerardo Sierra +2 more
- pp 1-10
Reads0
Chats0
TLDR
The RST Spanish Treebank is presented, the first corpus annotated with rhetorical relations for this language, and the interface that is developed to carry out searches over the corpus' annotated texts is shown.Abstract:
In this article we present the RST Spanish Treebank, the first corpus annotated with rhetorical relations for this language. We describe the characteristics of the corpus, the annotation criteria, the annotation procedure, the inter-annotator agreement, and other related aspects. Moreover, we show the interface that we have developed to carry out searches over the corpus' annotated texts.read more
Citations
More filters
CSTNews - A Discourse-Annotated Corpus for Single and Multi-Document Summarization of News Texts in Brazilian Portuguese
Paula Christina Figueira Cardoso,Erick Galani Maziero,Maria Lucía,R. Castro Jorge,Ariani Di Felippo,Lucia Helena Machado Rino,Maria das Graças,Volpe Nunes,Thiago Alexandre Salgueiro Pardo,Rodovia Washington Luís +9 more
TL;DR: CSTNews, a discourse-annotated corpus for fostering research on single and multi-document summarization, is introduced within the context of the SUCINTO Project, which aims at investigating summarization strategies and developing tools and resources for that purpose.
Proceedings ArticleDOI
Cross-lingual RST Discourse Parsing
TL;DR: A new discourse parser which is simpler, yet competitive (significantly better on 2/3 metrics) to state of the art for English, and a harmonization of discourse treebanks across languages are presented, enabling the first experiments on cross-lingual discourse parsing to be presented.
Journal ArticleDOI
A qualitative comparison method for rhetorical structures: identifying different discourse structures in multilingual corpora
TL;DR: A new type of comparison is shown that has important advantages with regard to the quantitative method usually employed: it provides an accurate measurement of inter-annotator agreement, and it pinpoints sources of disagreement among annotators.
Proceedings Article
Discourse Structure and Computation: Past, Present and Future
Bonnie Webber,Aravind K. Joshi +1 more
TL;DR: The challenges faced by the current understanding of discourse, the technology they employ, the applications they support, and the applications that meeting these challenges will promote are recounted.
Proceedings ArticleDOI
Cross-lingual and cross-domain discourse segmentation of entire documents
TL;DR: This article proposed statistical discourse segmenters for five languages and three domains that do not rely on gold pre-annotations, and achieved 89.5% F1 for English newswire, with slight drops in performance on other domains.
References
More filters
Journal ArticleDOI
A Coefficient of agreement for nominal Scales
TL;DR: In this article, the authors present a procedure for having two or more judges independently categorize a sample of units and determine the degree, significance, and significance of the units. But they do not discuss the extent to which these judgments are reproducible, i.e., reliable.
ReportDOI
Building a large annotated corpus of English: the penn treebank
TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.
Journal ArticleDOI
Rhetorical Structure Theory : Toward a Functional Theory of Text Organization
TL;DR: Rhetorical Structure Theory (RST) as mentioned in this paper is a descriptive theory of a major aspect of the organization of natural text, which is a linguistically useful method for describing natural texts, characterizing their Structure primarily in terms of relations that hold between parts of the text.
Journal ArticleDOI
Inter-coder agreement for computational linguistics
TL;DR: It is argued that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in computational linguistics, may be more appropriate for many corpus annotation tasks—but that their use makes the interpretation of the value of the coefficient even harder.