Topic
Shallow parsing
About: Shallow parsing is a research topic. Over the lifetime, 397 publications have been published within this topic receiving 10211 citations.
Papers published on a yearly basis
Papers
More filters
••
17 Dec 2006TL;DR: A novel selection method for tri-training learning in which newly labeled sentences are selected by comparing the agreements of three classifiers if the other two classifiers agree on the labels while itself disagrees.
Abstract: This paper presents a practical tri-training method for Chinese chunking using a small amount of labeled training data and a much larger pool of unlabeled data. We propose a novel selection method for tri-training learning in which newly labeled sentences are selected by comparing the agreements of three classifiers. In detail, in each iteration, a new sample is selected for a classifier if the other two classifiers agree on the labels while itself disagrees. We compare the proposed tri-training learning approach with co-training learning approach on Upenn Chinese Treebank V4.0(CTB4). The experimental results show that the proposed approach can improve the performance significantly.
16 citations
••
25 Aug 2009TL;DR: This article presents a formalism and a beta version of a new tool for simultaneous morphosyntactic disambiguation and shallow parsing, which facilitates the task of the shallow parsing of Morphosyntactically ambiguous or erroneouslydisambiguated input.
Abstract: This article presents a formalism and a beta version of a new tool for simultaneous morphosyntactic disambiguation and shallow parsing. Unlike in the case of other shallow parsing formalisms, the rules of the grammar allow for explicit morphosyntactic disambiguation statements, independently of structure-building statements, which facilitates the task of the shallow parsing of morphosyntactically ambiguous or erroneously disambiguated input.
15 citations
••
16 Feb 2003TL;DR: The adequacy of the clustering method when applied to a syntactically tagged corpus, and the relevance of the semantic content of the resulting clusters are evaluated.
Abstract: The context of this paper is the application of unsupervised Machine Learning techniques to building ontology extraction tools for Natural Language Processing. Our method relies on exploiting large amounts of linguistically annotated text, and on linguistic concepts such as selectional restrictions and co-composition.
We work with a corpus of medical texts in English. First we apply a shallow parser to the corpus to get subject-verb-object structures. We then extract verb-noun relations, and apply a clustering algorithm to them to build semantic classes of nouns. We have evaluated the adequacy of the clustering method when applied to a syntactically tagged corpus, and the relevance of the semantic content of the resulting clusters.
15 citations
••
TL;DR: The structure of written Thai is highly ambiguous, which requires more sophisticated techniques than are necessary to perform comparable IE tasks in most European languages, and large amounts of domain knowledge to cope with these ambiguities.
Abstract: The development of an information extraction (IE) system for Thai documents raises a number of issues which are not important for IE in English and other European languages. We describe the characteristics of written Thai and the problem statements, and our approach to the Thai IE system. The structure of written Thai is highly ambiguous, which requires more sophisticated techniques than are necessary to perform comparable IE tasks in most European languages, and large amounts of domain knowledge to cope with these ambiguities. The basic characteristic of this system is to provide different natural language components to assess the surface structure of the documents. These components include word segmentation, specific lexical structure terms identification and part-of-speech tagger. Further analysis is to perform a shallow parsing based on the relevant regions that contain the specific trigger terms or patterns specified in the extraction templates. Finally, the information of interest is extracted from the grammar trees in corresponding to predefined concept definitions and returns the users with a list of answers responding to each concept.
15 citations
••
13 Sep 2000TL;DR: This work produces tagging and chunking in a single process using an Integrated Language Model formalized as Markov Models that integrates several knowledge sources: lexical probabilities, a contextual Language Model for every chunk, and a contextual LM for the sentences.
Abstract: In this work, we present a stochastic approach to shallow parsing. Most of the current approaches to shallow parsing have a common characteristic: they take the sequence of lexical tags proposed by a POS tagger as input for the chunking process. Our system produces tagging and chunking in a single process using an Integrated Language Model (ILM) formalized as Markov Models. This model integrates several knowledge sources: lexical probabilities, a contextual Language Model (LM) for every chunk, and a contextual LM for the sentences. We have extended the ILM by adding lexical information to the contextual LMs. We have applied this approach to the CoNLL-2000 shared task improving the performance of the chunker.
15 citations