scispace - formally typeset
Search or ask a question
Topic

Shallow parsing

About: Shallow parsing is a research topic. Over the lifetime, 397 publications have been published within this topic receiving 10211 citations.


Papers
More filters
01 Jan 2002
TL;DR: A text processing system that uses shallow parsing techniques to extract information from sentences in text documents and stores frames of information in a knowledge base that is approaching more complete text understanding in a practical way that does not require expensive processing such as full parsing of the documents.
Abstract: The system described in this paper automatically extracts and stores information from documents. We have implemented a text processing system that uses shallow parsing techniques to extract information from sentences in text documents and stores frames of information in a knowledge base. We intend to use this system in two main application areas: open domain Question & Answering (Q&A) and specific domain information extraction. Extraction from Documents The system described in this paper uses a Natural Language Processing system developed at the Center for Natural Language Processing to extract information from documents and store it in a knowledge base. In the past, applications were aimed at MUC-style information extraction that filled in templates of specific types of information. Our current goal is to produce a system that can extract generic frames of information about all entities and events in the sentences of the text and represent relationships between them. This type of system is approaching more complete text understanding in a practical way that does not require expensive processing such as full parsing of the documents. The heart of the generic extraction system is a set of rules written for a finite-state system that recognizes the patterns of text. These rules are applied in several phases including part-of-speech tagging, bracketing of noun phrases, and categorization of proper noun phrases. Later phases recognize the surface structure of phrases in each sentence and map the phrases to the case frame of the verbs, recognizing the phrases taking the roles of agent, object, point-in-time, etc., and creating a frame representing an “event”. The case roles are similar to those in case grammars (Fillmore 1968). Consider the example sentence: In addition to these most recent incidents, the Abu Sayyaf have bought Russian uranium on Basilan Island.

4 citations

Proceedings Article
01 Jan 2006
TL;DR: A probability model is proposed to score the confidence of protein-protein interactions based on both text mining results and gene expression profiles, and experimental results are presented to show the feasibility of this framework.
Abstract: Protein-protein interactions referring to the associations of protein molecules are crucial for many biological functions. Since most knowledge about them still hides in biological publications, there is an increasing focus on mining information from the vast amount of biological literature such as MedLine. Many approaches, such as pattern matching, shallow parsing and deep parsing, have been proposed to automatically extract protein-protein interaction information from text sources, with however limited success. Moreover, to the best of our knowledge, none of the existing approaches have performed automatic validation on the mining results. In this paper, we describe a novel framework in which text mining results are automatically validated using the knowledge mined from gene expression profiles. A probability model is proposed to score the confidence of protein-protein interactions based on both text mining results and gene expression profiles. Experimental results are presented to show the feasibility of this framework.

4 citations

Proceedings ArticleDOI
20 Jun 2008
TL;DR: This paper proposed a new method to detect and resolve zero pronouns in Chinese text with integrated automatic main verbs identification, verbal logic valence and machine learning approach and demonstrated this zero pronouns identifying and resolving method works effectively.
Abstract: This paper proposed a new method to detect and resolve zero pronouns in Chinese text with integrated automatic main verbs identification, verbal logic valence and machine learning approach. Zero pronoun recognition was treated as the problem of finding missing verbs logic arguments. First, based on automatic main verbs identification, syntax hierarchies were analysed. Second, combining the syntax hierarchy and verbal logic valence theory, zero pronouns were identified. And then using a machine learning approach, zero pronouns were resolved. Experimental results on 150 news articles indicated that the precision and recall of zero pronoun detection is 72.9% and 92.7% respectively, and the accuracy of antecedent estimation is 64.3%. . These results demonstrated this zero pronouns identifying and resolving method works effectively.

4 citations

01 Jan 2003
TL;DR: A corpus-based shallow parsing approach to syntactic analysis of natural language from the perspective of practicality and economy is adopted, and a new memory-based algorithm for learning shallow syntax, called rote sequence learning, is contributed.
Abstract: For natural language applications to become widespread, they must be both practical and economical. Practicality demands that systems are robust and efficient enough to handle realistic input. Economy demands that systems are inexpensive to construct and maintain. This dissertation explores syntactic analysis of natural language from the perspective of practicality and economy. We adopt a corpus-based shallow parsing approach to syntactic analysis. Shallow parsing addresses practicality by avoiding difficult attachment decisions and by employing simple, efficient algorithms. Corpus-based language learning addresses economy by applying machine learning techniques to develop language processing components. In particular, we contributed a new memory-based algorithm for learning shallow syntax, called rote sequence learning. Our experiments demonstrate that rote sequence learning achieves comparable performance to other, more complex, shallow parsing methods. Moreover, rote sequence learning possesses a number of desirable properties, including simplicity, efficiency, and portability. To support rote sequence learning, we developed algorithms for pruning bad rules from the grammar, for incorporating arbitrary additional information into the grammar using statistical models, and for determining the best parse among all possible parses. Rote sequence learning addresses the practicality requirement for shallow parsing. To address economy, we investigated learning strategies that allow the machine learner to manipulate its training setting. The goal of these strategies is to reduce the cost of training by reducing the number of examples needed and/or by reducing the cost of assembling the examples. In particular, active learning allows the learner to select training examples and ask the human teacher for answers, and weakly supervised learning allows the learner to guess at answers to some of the examples on its own. In experiments with these two strategies, we discovered interesting behaviors of each. Finally, we contributed a new learning strategy, cooperative learning, that combines the best aspects of active and weakly supervised learning.

4 citations

Proceedings Article
01 Jan 2008
TL;DR: A shallow parsing formalism aiming at machine translation between closely related languages by allowing to write grammar rules helping to (partially) disambiguate chunks in input sentences.
Abstract: This paper describes a shallow parsing formalism aiming at machine translation between closely related languages. The formalism allows to write grammar rules helping to (partially) disambiguate chunks in input sentences. The chunks are then translatred into the target language without any deep syntactic or semantic processing. A stochastic ranker then selects the best translation according to the target language model. The results obtained for Czech and Slovak are presented.

4 citations


Network Information
Related Topics (5)
Machine translation
22.1K papers, 574.4K citations
81% related
Natural language
31.1K papers, 806.8K citations
79% related
Language model
17.5K papers, 545K citations
79% related
Parsing
21.5K papers, 545.4K citations
79% related
Query language
17.2K papers, 496.2K citations
74% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20217
202012
20196
20185
201711
201611