Journal ArticleDOI
TIGER: Linguistic Interpretation of a German Corpus
Sabine Brants,Stefanie Dipper,Peter Eisenberg,Silvia Hansen-Schirra,Esther König,Wolfgang Lezius,Christian Rohrer,George Smith,Hans Uszkoreit +8 more
Reads0
Chats0
TLDR
The TIGER Treebank, a corpus of currently 40,000 syntactically annotated German newspaper sentences, is described and the query language which was designed to facilitate a simple formulation of complex queries is described, a graphical user interface for query input.Abstract:
This paper reports on the TIGER Treebank, a corpus of currently 40,000 syntactically annotated German newspaper sentences. We describe what kind of information is encoded in the treebank and introduce the different representation formats that are used for the annotation and exploitation of the treebank. We explain the different methods used for the annotation: interactive annotation, using the tool ANNOTATE, and LFG parsing. Furthermore, we give an account of the annotation scheme used for the TIGER treebank. This scheme is an extended and improved version of the NEGRA annotation scheme and we illustrate in detail the linguistic extensions that were made concerning the annotation in the TIGER project. The main differences are concerned with coordination, verb-subcategorization, expletives as well as proper nouns. In addition, the paper also presents the query tool TIGERSearch that was developed in the project to exploit the treebank in an adequate way. We describe the query language which was designed to facilitate a simple formulation of complex queries; furthermore, we shortly introduce TIGER in, a graphical user interface for query input. The paper concludes with a summary and some directions for future work.read more
Citations
More filters
Journal ArticleDOI
dlexDB : eine lexikalische Datenbank für die psychologische und linguistische Forschung
Julian Heister,Kay-Michael Würzner,Johannes Bubenzer,Edmund Pohl,Thomas Hanneforth,Alexander Geyken,Reinhold Kliegl +6 more
TL;DR: In this paper, a lexikalischen Datenbank dlexDB stelle wir der psychologischen und linguistischen Forschung im World Wide Web online statistische Kennwerte fur eine Vielzahl von verarbeitungsrelevanten Merkmalen von Wortern zur Verfugung.
Journal ArticleDOI
childLex: A lexical database of German read by children
TL;DR: This article introduces childLex, an online database of German read by children that is based on a corpus of children’s books and comprises 10 million words that were syntactically annotated and lemmatized.
Journal ArticleDOI
Parallel Processing and Sentence Comprehension Difficulty.
TL;DR: This study finds support in German readers' eye fixations for two distinct difficulty metrics: surprisal, which reflects the change in probabilities across syntactic analyses as new words are integrated; and retrieval, which quantifies comprehension difficulty in terms of working memory constraints.
Proceedings ArticleDOI
Morphological Word-Embeddings
Ryan Cotterell,Hinrich Schütze +1 more
TL;DR: The authors consider guiding word embeddings with morphologically annotated data, a form of semisupervised learning, encouraging the vectors to encode a word's morphology, i.e., words close in the embedded space share morphological features.
Proceedings Article
XML-based Stand-off Representation and Exploitation of Multi-Level Linguistic Annotation.
TL;DR: An XML-based, generic stand-off architecture for multi-level linguistic annotations is proposed and an example instantiation of this architecture is presented and application scenarios that profit from this architecture are sketched out.
References
More filters
Journal ArticleDOI
Categorical Data Analysis
TL;DR: In this article, categorical data analysis was used for categorical classification of categorical categorical datasets.Categorical Data Analysis, categorical Data analysis, CDA, CPDA, CDSA
Journal ArticleDOI
The Mental representation of grammatical relations
TL;DR: In this article, twelve articles are grouped into three sections, as follows: "I. Syntactic Representation: " Lexical-Functional Grammar: A Formal Theory for Grammatical Representation (R. Kaplan and J. Bresnan); Control and Complementation (J.Bresnan).
Posted Content
TnT - A Statistical Part-of-Speech Tagger
TL;DR: Trigrams'n'Tags (TnT) is an efficient statistical part-of-speech tagger as mentioned in this paper, which is based on Markov models and has been shown to perform at least as well as other current approaches, including the Maximum Entropy framework.
Proceedings ArticleDOI
TnT -- A Statistical Part-of-Speech Tagger
TL;DR: Contrary to claims found elsewhere in the literature, it is argued that a tagger based on Markov models performs at least as well as other current approaches, including the Maximum Entropy framework.
Proceedings ArticleDOI
The Penn Treebank: annotating predicate argument structure
Mitchell Marcus,Grace Kim,Mary Ann Marcinkiewicz,Robert MacIntyre,Ann Bies,Mark Ferguson,Karen Katz,Britta Schasberger +7 more
TL;DR: The Penn Treebank has recently implemented a new syntactic annotation scheme, designed to highlight aspects of predicate-argument structure as discussed by the authors, which incorporates a more consistent treatment of a wide range of grammatical phenomena, provides a set of coindexed null elements in what can be thought of as "underlying" position for phenomena such as wh-movement, passive, and the subjects of infinitival constructions, and allows for a clear, concise tagging system for some semantic roles.