TIGER: Linguistic Interpretation of a German Corpus

doi:10.1007/S11168-004-7431-3

Journal ArticleDOI

TIGER: Linguistic Interpretation of a German Corpus

Sabine Brants, +8 more

- 01 Dec 2004 -

Research on Language and Computation

- Vol. 2, Iss: 4, pp 597-620

Chats0

TLDR

The TIGER Treebank, a corpus of currently 40,000 syntactically annotated German newspaper sentences, is described and the query language which was designed to facilitate a simple formulation of complex queries is described, a graphical user interface for query input.

Abstract:

This paper reports on the TIGER Treebank, a corpus of currently 40,000 syntactically annotated German newspaper sentences. We describe what kind of information is encoded in the treebank and introduce the different representation formats that are used for the annotation and exploitation of the treebank. We explain the different methods used for the annotation: interactive annotation, using the tool ANNOTATE, and LFG parsing. Furthermore, we give an account of the annotation scheme used for the TIGER treebank. This scheme is an extended and improved version of the NEGRA annotation scheme and we illustrate in detail the linguistic extensions that were made concerning the annotation in the TIGER project. The main differences are concerned with coordination, verb-subcategorization, expletives as well as proper nouns. In addition, the paper also presents the query tool TIGERSearch that was developed in the project to exploit the treebank in an adequate way. We describe the query language which was designed to facilitate a simple formulation of complex queries; furthermore, we shortly introduce TIGER in, a graphical user interface for query input. The paper concludes with a summary and some directions for future work.

TIGER: Linguistic Interpretation of a German Corpus

Citations

dlexDB : eine lexikalische Datenbank für die psychologische und linguistische Forschung

childLex: A lexical database of German read by children

Parallel Processing and Sentence Comprehension Difficulty.

Morphological Word-Embeddings

XML-based Stand-off Representation and Exploitation of Multi-Level Linguistic Annotation.

References

Categorical Data Analysis

The Mental representation of grammatical relations

TnT - A Statistical Part-of-Speech Tagger

TnT -- A Statistical Part-of-Speech Tagger

The Penn Treebank: annotating predicate argument structure

Related Papers (5)

Building a large annotated corpus of English: the penn treebank

TnT -- A Statistical Part-of-Speech Tagger

Feature-rich part-of-speech tagging with a cyclic dependency network

CoNLL-X Shared Task on Multilingual Dependency Parsing

Dependency Syntax: Theory and Practice