Topic
Shallow parsing
About: Shallow parsing is a research topic. Over the lifetime, 397 publications have been published within this topic receiving 10211 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: Four new k-best extensions of max-margin structured algorithms are introduced and their properties and connection are discussed and evaluated, showing how the proposed algorithms are affected by the changes of k in terms of the F-measure and computational time.
Abstract: Structured learning algorithms usually require inference during the training
procedure. Due to their exponential size of output space, the parameter
update is performed only on a relatively small collection built from the
“best” structures. The k-best MIRA is an example of an online algorithm which
seeks optimal parameters by making updates on k structures with the highest
score at a time. Following the idea of using k-best structures during the
learning process, in this paper we introduce four new k-best extensions of
max-margin structured algorithms. We discuss their properties and connection,
and evaluate all algorithms on two sequence labeling problems, the shallow
parsing and named entity recognition. The experiments show how the proposed
algorithms are affected by the changes of k in terms of the F-measure and
computational time, and that the proposed algorithms can improve results in
comparison to the single best case. Moreover, the restriction to the single
best case produces a comparison of the existing algorithms.
01 Jan 2010
TL;DR: The approach is an effective way to parse a tennis game from a stream of events with minimal human intervention, and makes use of some extra contextual information, namely the time gap between two adjacent match events, which is in itself a reasonable indicator of segmentation.
Abstract: This paper proposes a method to infer the syntactical units of a sports game (tennis) from a stream of game events. We assume that we are given a sequence of events within the game (examples of events are “serve”, “rally”, “score announcement” etc.), with their durations, and our goal is to segment them into “units” that are meaningful for the game, such as a “point”. Such a segmentation is essential for understanding the way that the events relate to each other, and hence for inferring automatically the structure of the game. We use a multi-gram based technique to segment the event steam into variable-length sequences by estimating the optimal (maximum-likelihood) segmentation using the Viterbi algorithm. We then make use of some extra contextual information, namely the time gap between two adjacent match events, which is in itself a reasonable indicator of segmentation. By integrating this feature into the multigram segmentation, we considerably enhance segmentation performance. The results show that our approach is an effective way to parse a tennis game from a stream of events with minimal human intervention. Keywords-Shallow parsing; variable-length unit; segmentation; game learning;
•
17 Dec 2004
TL;DR: This project takes a corpus of aviation safety reports parsed by Cass, an existing partial parser, with a particular given grammar, and looks for instances of linguistic constructs whose treatment by the parser could be improved by modifications to the grammar.
Abstract: With the growth of the World Wide Web in the nineties, alongside the increase in storage and processing capabilities of computer hardware, the problem of information overload resulted in an increased interest in finite-state techniques for Natural Language Analysis as an alternative to fragile, slower algorithms that would attempt to find complete parses for sentences based on general theories of language. As it turns out, shallow parsing, a set of robust parsing techniques based on finite state machines, provide incomplete yet very useful parses for unconstrained running text. The technique, however, will never provide 100% accuracy and requires that grammars be geared to the needs of particular data samples. In this project, we take a corpus of aviation safety reports parsed by Cass, an existing partial parser, with a particular given grammar, and look for instances of linguistic constructs whose treatment by the parser could be improved by modifications to the grammar. A few such constructs are discussed, and the grammar is edited to reflect the desired improvements. A parser accuracy measure is implemented and evaluated before and after the grammar modifications.
01 Jan 2003
TL;DR: A shallow parsing module – SuSAna – that performs efficient analysis over unrestricted text that recognizes the boundaries, internal structure, and syntactic category of the syntactic constituents.
Abstract: This paper presents a shallow parsing module – SuSAna – that performs efficient analysis over unrestricted text. The module recognizes the boundaries, internal structure, and syntactic category of the syntactic constituents. In addition to the definition of syntactic structures, its grammar supports a hierarchy of symbols and a set of restrictions known as preferences. During the analysis, a directed graph is used for representing all the operations, preventing redundant computation. The algorithm has O(n2) complexity, where n is the number of lexical units in the segment. SuSAna can be used as a standalone application, fully integrated in a larger system for natural language processing, or in a client/server platform.
••
09 Oct 2015TL;DR: A bilingually-constrained recursive neural network BC-RNN model is proposed to combine the merits of source parsing and shallow parsing into a hierarchical phrase-based translation model and outperforms other state-of-the-art statistical machine translation methods.
Abstract: Hierarchical phrase-based translation models have advanced statistical machine translation SMT. Because such models can improve leveraging of syntactic information, two types of methods leveraging source parsing and leveraging shallow parsing are applied to introduce syntactic constraints into translation models. In this paper, we propose a bilingually-constrained recursive neural network BC-RNN model to combine the merits of these two types of methods. First we perform supervised learning on a manually parsed corpus using the standard recursive neural network RNN model. Then we employ unsupervised bilingually-constrained tuning to improve the accuracy of the standard RNN model. Leveraging the BC-RNN model, we introduce both source parsing and shallow parsing information into a hierarchical phrase-based translation model. The evaluation demonstrates that our proposed method outperforms other state-of-the-art statistical machine translation methods for National Institute of Standards and Technology 2008 NIST 2008 Chinese-English machine translation testing data.