Journal ArticleDOI
ANNIS3: A new architecture for generic corpus query and visualization
Thomas Krause,Amir Zeldes +1 more
TLDR
This article proposes a generic solution for specialized corpus visualizations in a Web interface using annotation-triggered style sheets, which leverage the power of modern browsers and CSS for multiple and highly customizable views of primary data.Abstract:
This article is concerned with the data structures, properties of query languages, and visualization facilities required for the generic representation of richly annotated, heterogeneous linguistic corpora. We propose that above and beyond a general graph-based data model, which is becoming increasingly popular in many complex annotation formats, a well-defined concept of multiple, potentially conflicting segmentation layers must be introduced to deal with different sources and applications of corpus data flexibly. We also propose a generic solution for specialized corpus visualizations in a Web interface using annotation-triggered style sheets, which leverage the power of modern browsers and CSS for multiple and highly customizable views of primary data. We offer an implementation and evaluation of our architecture in ANNIS3, an open-source browser-based architecture for corpus search and visualization. We present three case studies to test the coverage of the system, encompassing core linguistic and digital humanities use-cases including richly annotated newspaper treebanks, multilingual diplomatic and normalized manuscript materials edited in TEI, and analysis of multimodal recordings of spoken language.read more
Citations
More filters
Journal ArticleDOI
The GUM corpus: creating multilayer resources in the classroom
TL;DR: The results of this project show that high quality, richly annotated resources can be created effectively as part of a linguistics curriculum, opening new possibilities not just for research, but also for corpora in linguistics pedagogy.
Proceedings ArticleDOI
On Close and Distant Reading in Digital Humanities: A Survey and Future Challenges
TL;DR: A taxonomy of applied methods for close and distant reading, and approaches that combine both reading techniques to provide a multifaceted view of the data are provided.
Journal ArticleDOI
Visual Text Analysis in Digital Humanities
TL;DR: An overview of the research conducted since 2005 on supporting text analysis tasks with close and distant reading visualizations in the digital humanities is presented and approaches that combine both reading techniques in order to provide a multi‐faceted view of the textual data are illustrated.
Proceedings ArticleDOI
ArchiMob - A Corpus of Spoken Swiss German
TL;DR: A bootstrapping approach to automatic normalisation using different machine-translation-inspired methods is presented and the performance of part-of-speech taggers on the authors' data is evaluated to show how the same bootstrapped approach improves part- of-speech tagging by 10% over four rounds.
Proceedings Article
TreeAnnotator: Versatile Visual Annotation of Hierarchical Text Relations.
TL;DR: TREEANNOTATOR’s interoperability exceeds similar tools, providing a wider range of formats, while annotation work can be completed more quickly due to a revised input method for RST dependency relations.
References
More filters
ReportDOI
Building a large annotated corpus of English: the penn treebank
TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.
Journal ArticleDOI
Rhetorical Structure Theory : Toward a Functional Theory of Text Organization
TL;DR: Rhetorical Structure Theory (RST) as mentioned in this paper is a descriptive theory of a major aspect of the organization of natural text, which is a linguistically useful method for describing natural texts, characterizing their Structure primarily in terms of relations that hold between parts of the text.
Book
Eclipse Modeling Framework
TL;DR: The authoritative guide to the Eclipse Modeling Framework (EMF)--written by the lead EMF designers! shows how EMF unifies three important technologies: Java, XML, and UML.
Journal ArticleDOI
The HCRC Map Task Corpus
Anne H. Anderson,Miles Bader,Ellen Gurman Bard,Elizabeth Boyle,Gwyneth Doherty,Simon Garrod,Stephen Isard,Jacqueline Kowtko,Jan McAllister,Jim Miller,Catherine Sotillo,Henry S. Thompson,Regina Weinert +12 more
TL;DR: A corpus of unscripted, task-oriented dialogues which has been designed, digitally recorded, and transcribed to support the study of spontaneous speech on many levels is described.