scispace - formally typeset
Search or ask a question
Topic

Shallow parsing

About: Shallow parsing is a research topic. Over the lifetime, 397 publications have been published within this topic receiving 10211 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Four new k-best extensions of max-margin structured algorithms are introduced and their properties and connection are discussed and evaluated, showing how the proposed algorithms are affected by the changes of k in terms of the F-measure and computational time.
Abstract: Structured learning algorithms usually require inference during the training procedure. Due to their exponential size of output space, the parameter update is performed only on a relatively small collection built from the “best” structures. The k-best MIRA is an example of an online algorithm which seeks optimal parameters by making updates on k structures with the highest score at a time. Following the idea of using k-best structures during the learning process, in this paper we introduce four new k-best extensions of max-margin structured algorithms. We discuss their properties and connection, and evaluate all algorithms on two sequence labeling problems, the shallow parsing and named entity recognition. The experiments show how the proposed algorithms are affected by the changes of k in terms of the F-measure and computational time, and that the proposed algorithms can improve results in comparison to the single best case. Moreover, the restriction to the single best case produces a comparison of the existing algorithms.
01 Jan 2010
TL;DR: The approach is an effective way to parse a tennis game from a stream of events with minimal human intervention, and makes use of some extra contextual information, namely the time gap between two adjacent match events, which is in itself a reasonable indicator of segmentation.
Abstract: This paper proposes a method to infer the syntactical units of a sports game (tennis) from a stream of game events. We assume that we are given a sequence of events within the game (examples of events are “serve”, “rally”, “score announcement” etc.), with their durations, and our goal is to segment them into “units” that are meaningful for the game, such as a “point”. Such a segmentation is essential for understanding the way that the events relate to each other, and hence for inferring automatically the structure of the game. We use a multi-gram based technique to segment the event steam into variable-length sequences by estimating the optimal (maximum-likelihood) segmentation using the Viterbi algorithm. We then make use of some extra contextual information, namely the time gap between two adjacent match events, which is in itself a reasonable indicator of segmentation. By integrating this feature into the multigram segmentation, we considerably enhance segmentation performance. The results show that our approach is an effective way to parse a tennis game from a stream of events with minimal human intervention. Keywords-Shallow parsing; variable-length unit; segmentation; game learning;
Dissertation
17 Dec 2004
TL;DR: This project takes a corpus of aviation safety reports parsed by Cass, an existing partial parser, with a particular given grammar, and looks for instances of linguistic constructs whose treatment by the parser could be improved by modifications to the grammar.
Abstract: With the growth of the World Wide Web in the nineties, alongside the increase in storage and processing capabilities of computer hardware, the problem of information overload resulted in an increased interest in finite-state techniques for Natural Language Analysis as an alternative to fragile, slower algorithms that would attempt to find complete parses for sentences based on general theories of language. As it turns out, shallow parsing, a set of robust parsing techniques based on finite state machines, provide incomplete yet very useful parses for unconstrained running text. The technique, however, will never provide 100% accuracy and requires that grammars be geared to the needs of particular data samples. In this project, we take a corpus of aviation safety reports parsed by Cass, an existing partial parser, with a particular given grammar, and look for instances of linguistic constructs whose treatment by the parser could be improved by modifications to the grammar. A few such constructs are discussed, and the grammar is edited to reflect the desired improvements. A parser accuracy measure is implemented and evaluated before and after the grammar modifications.
01 Jan 2003
TL;DR: A shallow parsing module – SuSAna – that performs efficient analysis over unrestricted text that recognizes the boundaries, internal structure, and syntactic category of the syntactic constituents.
Abstract: This paper presents a shallow parsing module – SuSAna – that performs efficient analysis over unrestricted text. The module recognizes the boundaries, internal structure, and syntactic category of the syntactic constituents. In addition to the definition of syntactic structures, its grammar supports a hierarchy of symbols and a set of restrictions known as preferences. During the analysis, a directed graph is used for representing all the operations, preventing redundant computation. The algorithm has O(n2) complexity, where n is the number of lexical units in the segment. SuSAna can be used as a standalone application, fully integrated in a larger system for natural language processing, or in a client/server platform.
Book ChapterDOI
09 Oct 2015
TL;DR: A bilingually-constrained recursive neural network BC-RNN model is proposed to combine the merits of source parsing and shallow parsing into a hierarchical phrase-based translation model and outperforms other state-of-the-art statistical machine translation methods.
Abstract: Hierarchical phrase-based translation models have advanced statistical machine translation SMT. Because such models can improve leveraging of syntactic information, two types of methods leveraging source parsing and leveraging shallow parsing are applied to introduce syntactic constraints into translation models. In this paper, we propose a bilingually-constrained recursive neural network BC-RNN model to combine the merits of these two types of methods. First we perform supervised learning on a manually parsed corpus using the standard recursive neural network RNN model. Then we employ unsupervised bilingually-constrained tuning to improve the accuracy of the standard RNN model. Leveraging the BC-RNN model, we introduce both source parsing and shallow parsing information into a hierarchical phrase-based translation model. The evaluation demonstrates that our proposed method outperforms other state-of-the-art statistical machine translation methods for National Institute of Standards and Technology 2008 NIST 2008 Chinese-English machine translation testing data.

Network Information
Related Topics (5)
Machine translation
22.1K papers, 574.4K citations
81% related
Natural language
31.1K papers, 806.8K citations
79% related
Language model
17.5K papers, 545K citations
79% related
Parsing
21.5K papers, 545.4K citations
79% related
Query language
17.2K papers, 496.2K citations
74% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20217
202012
20196
20185
201711
201611