scispace - formally typeset
Search or ask a question

Showing papers on "Shallow parsing published in 1999"


Book ChapterDOI
01 Jan 1999
TL;DR: This work has shown that the transformation-based learning approach can be applied at a higher level of textual interpretation for locating chunks in the tagged text, including non-recursive “baseNP” chunks.
Abstract: Transformation-based learning, a technique introduced by Eric Brill (1993b), has been shown to do part-of-speech tagging with fairly high accuracy. This same method can be applied at a higher level of textual interpretation for locating chunks in the tagged text, including non-recursive “baseNP” chunks. For this purpose, it is convenient to view chunking as a tagging problem by encoding the chunk structure in new tags attached to each word. In automatic tests using Treebank-derived data, this technique achieved recall and precision rates of roughly 93% for baseNP chunks (trained on 950K words) and 88% for somewhat more complex chunks that partition the sentence (trained on 200K words). Working in this new application and with larger template and training sets has also required some interesting adaptations to the transformation-based learning approach.

1,236 citations


Proceedings Article
01 Jan 1999
TL;DR: This article presented a memory-based learning approach to shallow parsing in which POS tagging, chunking, and identification of syntactic relations are formulated as memorybased modules, and the experiments reported in this paper show competitive results, the F-value for the WSJ treebank is: 93.8% for NP chunking.
Abstract: We present a memory-based learning (MBL) approach to shallow parsing in which POS tagging, chunking, and identification of syntactic relations are formulated as memory-based modules. The experiments reported in this paper show competitive results, the F-value for the Wall Street Journal (WSJ) treebank is: 93.8% for NP chunking, 94.7% for VP chunking, 77.1% for subject detection and 79.0% for object detection.

130 citations


Proceedings Article
01 Jan 1999
TL;DR: This paper introduces the technique of Predictive Annotation, a methodology for indexing texts for retrieval aimed at answering fact-seeking questions by establishing about 20 classes of objects that can be identified in text by shallow parsing, and by annotating and indexing the text with labels, which are called QA-Tokens.
Abstract: This paper introduces the technique of Predictive Annotation, a methodology for indexing texts for retrieval aimed at answering fact-seeking questions. The essence of the approach can be stated simply: index the answers. This is done by establishing about 20 classes of objects that can be identified in text by shallow parsing, and by annotating and indexing the text with these labels, which we call QA-Tokens. Given a question, its class is identified and the question is modified accordingly to include the appropriate token(s). The search engine is modified to rank and return short passages of text rather than documents. The QA-Tokens are used in later stages of analysis to extract the supposed answers from these returned passages. Finally, all potential answers are ranked using a novel formula, which determines which ones among them are most likely to be correct.

105 citations


Posted Content
TL;DR: A memory-based learning (MBL) approach to shallow parsing in which POS tagging, chunking, and identification of syntactic relations are formulated as memory- based modules are presented.
Abstract: We present a memory-based learning (MBL) approach to shallow parsing in which POS tagging, chunking, and identification of syntactic relations are formulated as memory-based modules. The experiments reported in this paper show competitive results, the F-value for the Wall Street Journal (WSJ) treebank is: 93.8% for NP chunking, 94.7% for VP chunking, 77.1% for subject detection and 79.0% for object detection.

102 citations


Proceedings Article
01 Aug 1999
TL;DR: In this article, a SNoW-based learning approach to shallow parsing tasks is presented and studied experimentally, and experimental results for Noun-Phrases (NP) and Subject-Verb (SV) phrases that compare favorably with the best published results are presented.
Abstract: A SNoW based learning approach to shallow parsing tasks is presented and studied experimentally. The approach learns to identify syntactic patterns by combining simple predictors to produce a coherent inference. Two instantiations of this approach are studied and experimental results for Noun-Phrases (NP) and Subject-Verb (SV) phrases that compare favorably with the best published results are presented. In doing that, we compare two ways of modeling the problem of learning to recognize patterns and suggest that shallow parsing patterns are better learned using open/close predictors than using inside/outside predictors.} thus contribute to the understanding of how to model shallow parsing tasks as learning problems.

89 citations


Proceedings ArticleDOI
20 Jun 1999
TL;DR: It is argued that the approach used in Humor 99 is general enough to be well suitable for a wide range of languages, and can serve as basis for higher-level linguistic operations such as shallow parsing.
Abstract: This paper introduces a new approach to morpho-syntactic analysis through Humor 99 ( H igh-speed U nification Mor phology), a reversible and unification-based morphological analyzer which has already been integrated with a variety of industrial applications. Humor 99 successfully copes with problems of agglutinative (e.g. Hungarian, Turkish, Estonian) and other (highly) inflectional languages (e.g. Polish, Czech, German) very effectively. The authors conclude the paper by arguing that the approach used in Humor 99 is general enough to be well suitable for a wide range of languages, and can serve as basis for higher-level linguistic operations such as shallow parsing.

53 citations


Proceedings Article
01 Jan 1999
TL;DR: The question answering system was built to experiment with natural language processing technologies such as shallow parsing, named entity tagging, and coreference chaining because it felt that the small number of terms in the questions coupled with the short length of the answers would make NLP technologies clearly beneficial.
Abstract: Our question answering system was built with a number of priorities in mind. First, we wanted to experiment with natural language processing (NLP) technologies such as shallow parsing, named entity tagging, and coreference chaining. We felt that the small number of terms in the questions coupled with the short length of the answers would make NLP technologies clearly beneficial, unlike previous experiments with NLP technologies on traditional IR tasks. At a more practical level, we were familiar with and interested in such technologies and thus their use would be relatively straightforward and enjoyable. Second, we wanted to use information retrieval (IR) techniques in hopes of achieving robustness and efficiency. It seemed obvious that many answers would appear in documents and passages laden with terms from the question. Finally, we wanted to experiment with different modules from different sites with differing input and output representation and implementational details. Thus, we needed a multi-process system with a flexible data format.

43 citations



01 Jan 1999
TL;DR: The aim of the present work is to investigate the effects of lexical information in a shallow parsing environment, and to study the limits of a bootstrapping architecture that guarantees the reliability and portability of the parser to different domains.
Abstract: Current NL parsers are expected to run with throughput rate suitable to satisfy ”time constraints” in real applications. The aim of the present work is, on the one hand, to investigate the effects of lexical information in a shallow parsing environment, on the other hand, to study the limits of a bootstrapping architecture that, automatically learning the lexical information in an unsupervised fashion, guarantees the reliability and portability of the parser to different domains. The investigated parser is Chaos (Chunk analysis oriented system), a robust parser based on stratification and lexicalization. Large scale evaluation over a standard tree bank is discussed.

12 citations


Journal Article
TL;DR: A Spanish Shallow Parser built using the Incremental Finite-State Parsing Architecture (IFSP) permits a constructivist syntactic analysis and produces a bracketed and annotated text where main segments and syntactic functions are identified.
Abstract: This paper describes a Spanish Shallow Parser built using the Incremental Finite-State Parsing Architecture (IFSP). This approach for Shallow Parsing permits a constructivist syntactic analysis: each transducer uses as input the result of the analysis given by the previous transducer for obtaining a more accurate syntactic analysis. The output is a bracketed and annotated text where main segments and syntactic functions are identified. The different transducers which make up the shallow parser are built using regular expressions which describe the syntactic characteristics of the language to parse, in this case Spanish.

12 citations


01 Jan 1999
TL;DR: An implemented spoken-language dialogue system for a travel planning domain which supports a mixed initiative dialogue strategy and a preliminary investigation using data from a Wizard of Oz experiment lends limited support to the hypothesis that deep linguistic processing will prove useful at points where the user takes the initiative in driving the dialogue forward.
Abstract: With maturing speech technology, spoken dialogue systems are increasingly moving from research prototypes to fielded systems. The fielded systems however generally employ much simpler linguistic and dialogue processing strategies than the research prototypes. We describe an implemented spoken-language dialogue system for a travel planning domain which supports a mixed initiative dialogue strategy. The system accesses a commercially available travel information web-server. The system architecture combines both shallow and deep linguistic processors, partly so that a robust if shallow analysis is always available to the dialogue manager, and partly so that we can begin to examine where significant gains can be made by employing more advanced linguistic processing. We present the results of a preliminary investigation using data from a Wizard of Oz experiment. The results lend limited support to our original hypothesis that deep linguistic processing will prove useful at points where the user takes the initiative in driving the dialogue forward.

Book ChapterDOI
Jean-Pierre Chanod1
TL;DR: Language components based mostly on finite-state technology that improve the authors' capabilities for exploring, enriching and interacting in various ways with documents are presented.
Abstract: As one envisions a document model where language, physical location and medium - electronic, paper or other - impose no barrier to effective use, natural language processing will play an increasing role, especially in the context of digital libraries. This paper presents language components based mostly on finite-state technology that improve our capabilities for exploring, enriching and interacting in various ways with documents. This ranges from morphology to part-of-speech tagging, NP extraction and shallow parsing. We then focus on a series of on-going projects which illustrate how this technology is already impacting the building and sharing of knowledge through digital libraries.