scispace - formally typeset

Topic

Shallow parsing

About: Shallow parsing is a(n) research topic. Over the lifetime, 397 publication(s) have been published within this topic receiving 10211 citation(s).


Papers
More filters
Proceedings ArticleDOI
27 May 2003
TL;DR: This work shows how to train a conditional random field to achieve performance as good as any reported base noun-phrase chunking method on the CoNLL task, and better than any reported single model.
Abstract: Conditional random fields for sequence labeling offer advantages over both generative models like HMMs and classifiers applied at each sequence position. Among sequence labeling tasks in language processing, shallow parsing has received much attention, with the development of standard evaluation datasets and extensive comparison among methods. We show here how to train a conditional random field to achieve performance as good as any reported base noun-phrase chunking method on the CoNLL task, and better than any reported single model. Improved training methods based on modern optimization algorithms were critical in achieving these results. We present extensive comparisons between models and training methods that confirm and strengthen previous results on shallow parsing and training methods for maximum-entropy models.

1,452 citations

Book ChapterDOI
01 Jan 1999
TL;DR: This work has shown that the transformation-based learning approach can be applied at a higher level of textual interpretation for locating chunks in the tagged text, including non-recursive “baseNP” chunks.
Abstract: Transformation-based learning, a technique introduced by Eric Brill (1993b), has been shown to do part-of-speech tagging with fairly high accuracy. This same method can be applied at a higher level of textual interpretation for locating chunks in the tagged text, including non-recursive “baseNP” chunks. For this purpose, it is convenient to view chunking as a tagging problem by encoding the chunk structure in new tags attached to each word. In automatic tests using Treebank-derived data, this technique achieved recall and precision rates of roughly 93% for baseNP chunks (trained on 950K words) and 88% for somewhat more complex chunks that partition the sentence (trained on 200K words). Working in this new application and with larger template and training sets has also required some interesting adaptations to the transformation-based learning approach.

1,026 citations

Book ChapterDOI
01 Jan 1991
TL;DR: The typical chunk consists of a single content word surrounded by a constellation of function words, matching a fixed template, and the relationships between chunks are mediated more by lexical selection than by rigid templates.
Abstract: I begin with an intuition: when I read a sentence, I read it a chunk at a time. For example, the previous sentence breaks up something like this: (1) [I begin] [with an intuition]: [when I read] [a sentence], [I read it] [a chunk] [at a time] These chunks correspond in some way to prosodic patterns. It appears, for instance, that the strongest stresses in the sentence fall one to a chunk, and pauses are most likely to fall between chunks. Chunks also represent a grammatical watershed of sorts. The typical chunk consists of a single content word surrounded by a constellation of function words, matching a fixed template. A simple context-free grammar is quite adequate to describe the structure of chunks. By contrast, the relationships between chunks are mediated more by lexical selection than by rigid templates. Co-occurrence of chunks is determined not just by their syntactic categories, but is sensitive to the precise words that head them; and the order in which chunks occur is much more flexible than the order of words within chunks.

944 citations

Journal ArticleDOI
TL;DR: This work argues that with a systematic incremental methodology one can go beyond shallow parsing to deeper language analysis, while preserving robustness, and describes a generic system based on such a methodology and designed for building robust analyzers that tackle deeper linguistic phenomena than those traditionally handled by the now widespread shallow parsers.
Abstract: Robustness is a key issue for natural language processing in general and parsing in particular, and many approaches have been explored in the last decade for the design of robust parsing systems. Among those approaches is shallow or partial parsing, which produces minimal and incomplete syntactic structures, often in an incremental way. We argue that with a systematic incremental methodology one can go beyond shallow parsing to deeper language analysis, while preserving robustness. We describe a generic system based on such a methodology and designed for building robust analyzers that tackle deeper linguistic phenomena than those traditionally handled by the now widespread shallow parsers. The rule formalism allows the recognition of n-ary linguistic relations between words or constituents on the basis of global or local structural, topological and/or lexical conditions. It offers the advantage of accepting various types of inputs, ranging from raw to chunked or constituent-marked texts, so for instance it can be used to process existing annotated corpora, or to perform a deeper analysis on the output of an existing shallow parser. It has been successfully used to build a deep functional dependency parser, as well as for the task of co-reference resolution, in a modular way.

317 citations

Proceedings ArticleDOI
31 May 2003
TL;DR: The results from the conducted evaluation suggest that the new procedure is very effective saving time and labour considerably and that the test items produced with the help of the program are not of inferior quality to those produced manually.
Abstract: This paper describes a novel computer-aided procedure for generating multiple-choice tests from electronic instructional documents. In addition to employing various NLP techniques including term extraction and shallow parsing, the program makes use of language resources such as a corpus and WordNet. The system generates test questions and distractors, offering the user the option to post-edit the test items.

211 citations

Network Information
Related Topics (5)
Machine translation

22.1K papers, 574.4K citations

81% related
Natural language

31.1K papers, 806.8K citations

79% related
Language model

17.5K papers, 545K citations

79% related
Parsing

21.5K papers, 545.4K citations

79% related
Query language

17.2K papers, 496.2K citations

74% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20217
202012
20196
20185
201711
201611