scispace - formally typeset
Journal ArticleDOI

TIGER: Linguistic Interpretation of a German Corpus

Reads0
Chats0
TLDR
The TIGER Treebank, a corpus of currently 40,000 syntactically annotated German newspaper sentences, is described and the query language which was designed to facilitate a simple formulation of complex queries is described, a graphical user interface for query input.
Abstract
This paper reports on the TIGER Treebank, a corpus of currently 40,000 syntactically annotated German newspaper sentences. We describe what kind of information is encoded in the treebank and introduce the different representation formats that are used for the annotation and exploitation of the treebank. We explain the different methods used for the annotation: interactive annotation, using the tool ANNOTATE, and LFG parsing. Furthermore, we give an account of the annotation scheme used for the TIGER treebank. This scheme is an extended and improved version of the NEGRA annotation scheme and we illustrate in detail the linguistic extensions that were made concerning the annotation in the TIGER project. The main differences are concerned with coordination, verb-subcategorization, expletives as well as proper nouns. In addition, the paper also presents the query tool TIGERSearch that was developed in the project to exploit the treebank in an adequate way. We describe the query language which was designed to facilitate a simple formulation of complex queries; furthermore, we shortly introduce TIGER in, a graphical user interface for query input. The paper concludes with a summary and some directions for future work.

read more

Citations
More filters
Journal ArticleDOI

dlexDB : eine lexikalische Datenbank für die psychologische und linguistische Forschung

TL;DR: In this paper, a lexikalischen Datenbank dlexDB stelle wir der psychologischen und linguistischen Forschung im World Wide Web online statistische Kennwerte fur eine Vielzahl von verarbeitungsrelevanten Merkmalen von Wortern zur Verfugung.
Journal ArticleDOI

childLex: A lexical database of German read by children

TL;DR: This article introduces childLex, an online database of German read by children that is based on a corpus of children’s books and comprises 10 million words that were syntactically annotated and lemmatized.
Journal ArticleDOI

Parallel Processing and Sentence Comprehension Difficulty.

TL;DR: This study finds support in German readers' eye fixations for two distinct difficulty metrics: surprisal, which reflects the change in probabilities across syntactic analyses as new words are integrated; and retrieval, which quantifies comprehension difficulty in terms of working memory constraints.
Proceedings ArticleDOI

Morphological Word-Embeddings

TL;DR: The authors consider guiding word embeddings with morphologically annotated data, a form of semisupervised learning, encouraging the vectors to encode a word's morphology, i.e., words close in the embedded space share morphological features.
Proceedings Article

XML-based Stand-off Representation and Exploitation of Multi-Level Linguistic Annotation.

TL;DR: An XML-based, generic stand-off architecture for multi-level linguistic annotations is proposed and an example instantiation of this architecture is presented and application scenarios that profit from this architecture are sketched out.
References
More filters
Journal ArticleDOI

Categorical Data Analysis

Alan Agresti
- 01 May 1991 - 
TL;DR: In this article, categorical data analysis was used for categorical classification of categorical categorical datasets.Categorical Data Analysis, categorical Data analysis, CDA, CPDA, CDSA
Journal ArticleDOI

The Mental representation of grammatical relations

Joan Bresnan
- 01 Dec 1985 - 
TL;DR: In this article, twelve articles are grouped into three sections, as follows: "I. Syntactic Representation: " Lexical-Functional Grammar: A Formal Theory for Grammatical Representation (R. Kaplan and J. Bresnan); Control and Complementation (J.Bresnan).
Posted Content

TnT - A Statistical Part-of-Speech Tagger

TL;DR: Trigrams'n'Tags (TnT) is an efficient statistical part-of-speech tagger as mentioned in this paper, which is based on Markov models and has been shown to perform at least as well as other current approaches, including the Maximum Entropy framework.
Proceedings ArticleDOI

TnT -- A Statistical Part-of-Speech Tagger

TL;DR: Contrary to claims found elsewhere in the literature, it is argued that a tagger based on Markov models performs at least as well as other current approaches, including the Maximum Entropy framework.
Proceedings ArticleDOI

The Penn Treebank: annotating predicate argument structure

TL;DR: The Penn Treebank has recently implemented a new syntactic annotation scheme, designed to highlight aspects of predicate-argument structure as discussed by the authors, which incorporates a more consistent treatment of a wide range of grammatical phenomena, provides a set of coindexed null elements in what can be thought of as "underlying" position for phenomena such as wh-movement, passive, and the subjects of infinitival constructions, and allows for a clear, concise tagging system for some semantic roles.