scispace - formally typeset
Open AccessJournal ArticleDOI

A formal framework for linguistic annotation

Steven Bird, +1 more
- 01 Jan 2001 - 
- Vol. 33, Iss: 1, pp 23-60
TLDR
A wide variety of existing annotation formats are surveyed and a common conceptual core, the annotation graph, is demonstrated, which provides a formal framework for constructing, maintaining and searching linguistic annotations, while remaining consistent with many alternative data structures and file formats.
About
This article is published in Speech Communication.The article was published on 2001-01-01 and is currently open access. It has received 398 citations till now. The article focuses on the topics: Annotation & File format.

read more

Citations
More filters
Journal ArticleDOI

GATE, a General Architecture for Text Engineering

TL;DR: GATE lies at the intersection of human language computation and software engineering, and constitutes aninfrastructural system supporting research and development of languageprocessing software.
Proceedings Article

ANVIL A Generic Annotation Tool for Multimodal Dialogue

Michael Kipp
TL;DR: Anvil is a tool for the annotation of audiovisual material containing multimodal dialogue by inserting time-anchored elements that hold a number of typed attribute-value pairs.
Journal ArticleDOI

What is Corpus Linguistics

TL;DR: This article answers a few questions that corpus linguists regularly face from linguists who have not used corpus-based methods so far and discusses some of the central assumptions, notions, and methods of corpus linguistics.
Proceedings Article

Annotating Multi-media/Multi-modal Resources with ELAN

TL;DR: The actual state of development of the manual annotation tool ELAN is shown and usage requirements from three different groups of users are presented and one annotation model and a number of generic design principles guided the choices made during the development process of ELAN.
Journal ArticleDOI

Transcriber: Development and use of a tool for assisting speech corpora production

TL;DR: Transcriber was designed for the manual segmentation and transcription of long duration broadcast news recordings, including annotation of speech turns, topics and acoustic conditions and has been tested on various Unix systems and Windows.
References
More filters
ReportDOI

Building a large annotated corpus of English: the penn treebank

TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.
Journal ArticleDOI

Maintaining knowledge about temporal intervals

TL;DR: In this paper, an interval-based temporal logic is introduced, together with a computationally effective reasoning algorithm based on constraint propagation, which is notable in offering a delicate balance between time and space.
Book

Foundations of databases

TL;DR: This book discusses Languages, Computability, and Complexity, and the Relational Model, which aims to clarify the role of Semantic Data Models in the development of Query Language Design.
Journal ArticleDOI

The CHILDES Project: Tools for Analyzing Talk

Clifton Pye, +1 more
- 01 Mar 1994 - 
TL;DR: This book describes three basic tools for language analysis of transcript data by computer that have been developed in the context of the "Child Language Data Exchange System (CHILDES)" project, and focuses on their use in the child language field, believing that researchers from other areas can make the necessary analogies to their own topics.
Book

The Childes Project: Tools for Analyzing Talk

TL;DR: The CHILDES corpus has been used for a wide variety of purposes, including editing non-Roman orthographies, systematically adding codes to transcripts, checking the files for correct use of "CHAT", and linking the files to digitized audio and videotape.
Frequently Asked Questions (9)
Q1. What are the contributions in "A formal framework for linguistic annotation" ?

This paper focuses instead on the logical structure of linguistic annotations. The authors survey a wide variety of existing annotation formats and demonstrate a common conceptual core, the annotation graph. This provides a formal framework for constructing, maintaining and searching linguistic annotations, while remaining consistent with many alternative data structures and file formats. This technical report is available at ScholarlyCommons: https: //repository. 

More complex modes of interaction are also possible, as are connections to other sorts of databases ; the authors regard this as a fruitful area for further research. 

Gaps might correspond to periods of silence, or to periods in between the salient events, or to periods which have yet to be annotated. 

The Tipster Architecture for linguistic annotation of text [19] is based on the concept of a fundamental, immutable textual foundation, with all annotations expressed in terms of byte offsets into this text. 

Based on the formal precedent of SGML, the model of how chart-like data structures are actually used in parsing, and the practical precedents of databases like TIMIT, it is tempting to consider adding a sort of grammar over arc labels as part of the formal definition of annotation graphs. 

The authors will usually need to break an annotation graph into chunks which can be presented line-by-line (much like interlinear text) in order to fit on a screen or a page. 

For the sake of a clean algebraic semantics for the query language, the authors will permit queries and the results of queries to be (sets of) arbitrary annotation graphs. 

The most direct way would be to treat Tipster byte offsets exactly as analogous to time references – since the only formal requirement on their time references is that they can be ordered. 

While the utility of existing tools, formats and databases is unquestionable, their sheer variety – and the lack of standards able to mediate among them – is becoming a critical problem. 

Trending Questions (1)
What kind of annotation format is this in Anderson, T.,?

The paper does not mention any specific annotation format by Anderson, T.