Showing papers on "Shallow parsing published in 2000"

PDF

Open Access

Proceedings Article•

The Use of Classifiers in Sequential Inference

[...]

Vasin Punyakanok¹, Dan Roth¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Jan 2000

TL;DR: A Markovian approach is developed that extends standard HMMs to allow the use of a rich observation structure and of general classifiers to model state-observation dependencies and an extension of constraint satisfaction formalisms are extended.

...read moreread less

Abstract: We study the problem of combining the outcomes of several different classifiers in a way that provides a coherent inference that satisfies some constraints. In particular, we develop two general approaches for an important subproblem - identifying phrase structure. The first is a Markovian approach that extends standard HMMs to allow the use of a rich observation structure and of general classifiers to model state-observation dependencies. The second is an extension of constraint satisfaction formalisms. We develop efficient combination algorithms under both models and study them experimentally in the context of shallow parsing.

...read moreread less

182 citations

Posted Content•

A Learning Approach to Shallow Parsing

[...]

Marcia Muñoz, Vasin Punyakanok, Dan Roth, Dav Zimak

22 Aug 2000-arXiv: Learning

TL;DR: This work compares two ways of modeling the problem of learning to recognize patterns and suggests that shallow parsing patterns are better learned using open/close predictors than using inside/outside predictors and thus contribute to the understanding of how to model shallow parsing tasks as learning problems.

...read moreread less

Abstract: A SNoW based learning approach to shallow parsing tasks is presented and studied experimentally. The approach learns to identify syntactic patterns by combining simple predictors to produce a coherent inference. Two instantiations of this approach are studied and experimental results for Noun-Phrases (NP) and Subject-Verb (SV) phrases that compare favorably with the best published results are presented. In doing that, we compare two ways of modeling the problem of learning to recognize patterns and suggest that shallow parsing patterns are better learned using open/close predictors than using inside/outside predictors.

...read moreread less

91 citations

Proceedings Article•DOI•

Shallow parsing as part-of-speech tagging

[...]

Miles Osborne¹•Institutions (1)

University of Edinburgh¹

13 Sep 2000

TL;DR: Treating shallow parsing as part-of-speech tagging yields results comparable with other, more elaborate approaches, using the CoNLL 2000 training and testing material.

...read moreread less

Abstract: Treating shallow parsing as part-of-speech tagging yields results comparable with other, more elaborate approaches. Using the CoNLL 2000 training and testing material, our best model had an accuracy of 94.88%, with an overall FB1 score of 91.94%. The individual FB1 scores for NPs were 92.19%, VPs 92.70% and PPs 96.69%.

...read moreread less

47 citations

Proceedings Article•DOI•

A Divide-and-Conquer Strategy for Shallow Parsing of German Free Texts

[...]

Günter Neumann, Christian Braunt, Jakub Piskorski

29 Apr 2000

TL;DR: The whole approach proved to be very useful for processing of free word order languages like German, especially for the divide-and-conquer parsing strategy, which obtained an f-measure of 87.14% on unseen data.

...read moreread less

Abstract: We present a divide-and-conquer strategy based on finite state technology for shallow parsing of real-world German texts. In a first phase only the topological structure of a sentence (i.e., verb groups, subclauses) are determined. In a second phase the phrasal grammars are applied to the contents of the different fields of the main and sub-clauses. Shallow parsing is supported by suitably configured preprocessing, including: morphological and on-line compound analysis, efficient POS-filtering, and named entity recognition. The whole approach proved to be very useful for processing of free word order languages like German. Especially for the divide-and-conquer parsing strategy we obtained an f-measure of 87.14% on unseen data.

...read moreread less

35 citations

Proceedings Article•DOI•

Lightweight validation of natural language requirements: a case study

[...]

Vincenzo Gervasi, Bashar Nuseibeh¹•Institutions (1)

Imperial College London¹

19 Jun 2000

TL;DR: A case study based on part of NASA's specification of the Node Control Software of the International Space Station is described, and the authors apply to it their method of checking properties on models obtained by shallow parsing of natural language requirements.

...read moreread less

Abstract: The authors report on their experiences of using lightweight formal methods for the partial validation of natural language (NL) requirements documents They describe a case study based on part of NASA's specification of the Node Control Software of the International Space Station, and apply to it their method of checking properties on models obtained by shallow parsing of natural language requirements These experiences support the position that it is feasible and useful to perform automated analysis of requirements expressed in natural language Indeed the authors identified a number of errors in their case study that were also independently discovered and corrected by NASA's IV and V Facility in a subsequent version of the same document The paper describes the techniques used, the errors found, and reflects on the lessons learned

...read moreread less

33 citations

Book Chapter•DOI•

Term alignment in use

[...]

Eric Gaussier¹, David A. Hull¹, Salah Ait-Mokhtar¹•Institutions (1)

Xerox¹

01 Jan 2000

TL;DR: This chapter will describe how parallel text extraction algorithms can be used for machine aided translation, focusing on two particular applications: semi-automatic construction of bilingual terminology lexicons and translation memory.

...read moreread less

Abstract: This chapter will describe how parallel text extraction algorithms can be used for machine aided translation, focusing on two particular applications: semi-automatic construction of bilingual terminology lexicons and translation memory. Automatic word alignment and terminology extraction algorithms can be combined to substantially speed the lexicon construction process. Using a highly accurate partial alignment of term constituents, a terminologist need only recognize and correct minor errors in the recognition of term boundaries. The next generation of translation memory systems will certainly use statistical alignment algorithms and shallow parsing technology to improve coverage of current systems, by allowing for linguistic abstraction and partial sentence matching. Abstracting away from lexical units to part-of-speech, number, term, or noun phrase classes will allow these systems to mix and match components.

...read moreread less

31 citations

Proceedings Article•DOI•

Improving chunking by means of lexical-contextual information in statistical language models

[...]

Ferran Pla¹, Antonio Molina¹, Natividad Prieto¹•Institutions (1)

Polytechnic University of Valencia¹

13 Sep 2000

TL;DR: This work produces tagging and chunking in a single process using an Integrated Language Model formalized as Markov Models that integrates several knowledge sources: lexical probabilities, a contextual Language Model for every chunk, and a contextual LM for the sentences.

...read moreread less

Abstract: In this work, we present a stochastic approach to shallow parsing. Most of the current approaches to shallow parsing have a common characteristic: they take the sequence of lexical tags proposed by a POS tagger as input for the chunking process. Our system produces tagging and chunking in a single process using an Integrated Language Model (ILM) formalized as Markov Models. This model integrates several knowledge sources: lexical probabilities, a contextual Language Model (LM) for every chunk, and a contextual LM for the sentences. We have extended the ILM by adding lexical information to the contextual LMs. We have applied this approach to the CoNLL-2000 shared task improving the performance of the chunker.

...read moreread less

15 citations

Proceedings Article•

Shallow Parsing And Functional Structure In Italian Corpora

[...]

Rodolfo Delmonte¹•Institutions (1)

Ca' Foscari University of Venice¹

01 May 2000

TL;DR: This paper argues in favour of an integration between statistically and syntactically based parsing by presenting data from a study of a 500,000 word corpus of Italian, including a syntactic shallow parser and a ATN-like grammatical function assigner that automatically classifies previously manually verified tagged corpora.

...read moreread less

Abstract: In this paper we argue in favour of an integration between statistically and syntactically based parsing by presenting data from a study of a 500,000 word corpus of Italian. Most papers present approaches on tagging which are statistically based. None of the statistically based analyses, however, produce an accuracy level comparable to the one obtained by means of linguistic rules [1]. Of course their data are strictly referred to English, with the exception of [2, 3, 4]. As to Italian, we argue that purely statistically based approaches are inefficient basically due to great sparsity of tag distribution – 50% or less of unambiguous tags when punctuation is subtracted from the total count. In addition, the level of homography is also very high: readings per word are 1.7 compared to 1.07 computed for English by [2] with a similar tagset. The current work includes a syntactic shallow parser and a ATN-like grammatical function assigner that automatically classifies previously manually verified tagged corpora. In a preliminary experiment we made with automatic tagger, we obtained 99,97% accuracy in the training set and 99,03% in the test set using combined approaches: data derived from statistical tagging is well below 95% even when referred to the training set, and the same applies to syntactic tagging. As to the shallow parser and GF-assigner we shall report on a first preliminary experiment on a manually verified subset made of 10,000 words.

...read moreread less

13 citations

Journal Article•

Statistics Based Chinese Chunk Parsin

[...]

Liu Fang

01 Jan 2000-Journal of Chinese information processing

TL;DR: A statistical algorithm is accomplished to recognize definite levels of Chinese chunks and it is proved that the algorithm gives a high accuracy for shallow parsing of real Chinese texts with robustness.

...read moreread less

Abstract: Chunk parsing is an effective method to decrease the difficulty of language parsing.This paper proposes a formal description representing the characteristics of Chinese chunks.Based on the description,a statistical algorithm is accomplished to recognize definite levels of Chinese chunks.The experiments have proved that the algorithm gives a high accuracy for shallow parsing of real Chinese texts with robustness.

...read moreread less

6 citations

Proceedings Article•DOI•

Shallow parsing by inferencing with classifiers

[...]

Vasin Punyakanok¹, Dan Roth¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

13 Sep 2000

TL;DR: A Markovian approach is developed that extends standard HMMs to allow the use of a rich observations structure and of general classifiers to model state-observation dependencies and an extension of constraint satisfaction formalisms are developed.

...read moreread less

Abstract: We study the problem of identifying phrase structure. We formalize it as the problem of combining the outcomes of several different classifiers in a way that provides a coherent inference that satisfies some constraints, and develop two general approaches for it. The first is a Markovian approach that extends standard HMMs to allow the use of a rich observations structure and of general classifiers to model state-observation dependencies. The second is an extension of constraint satisfaction formalisms. We also develop efficient algorithms under both models and study them experimentally in the context of shallow parsing.

...read moreread less

6 citations

Proceedings Article•DOI•

Spelling and Grammar Correction for Danish in SCARRIE

[...]

Patrizia Paggio

29 Apr 2000

TL;DR: The spelling and grammar corrector for Danish is superior to other existing spelling checkers for Danish in its ability to deal with context-dependent errors.

...read moreread less

Abstract: This paper reports on work carried out to develop a spelling and grammar corrector for Danish, addressing in particular the issue of how a form of shallow parsing is combined with error detection and correction for the treatment of context-dependent spelling errors. The syntactic grammar for Danish used by the system has been developed with the aim of dealing with the most frequent error types found in a parallel corpus of unedited and proofread texts specifically collected by the project's end users. By focussing on certain grammatical constructions and certain error types, it has been possible to exploit the linguistic 'intelligence' provided by syntactic parsing and yet keep the system robust and efficient. The system described is thus superior to other existing spelling checkers for Danish in its ability to deal with context-dependent errors.

...read moreread less

Building A Chinese Text Summarizer with Phrasal Chunks and Domain Knowledge

[...]

Weiquan Liu, Joe F. Zhou

01 Aug 2000

TL;DR: Though the system incorporates both statistical and text analysis models, the statistical model plays a major role during the automated process and a shallow parsing algorithm is used to eliminate the semantic redundancy.

...read moreread less

Abstract: This paper introduces a Chinese summarizier called ThemePicker. Though the system incorporates both statistical and text analysis models, the statistical model plays a major role during the automated process. In addition to word segmentation and proper names identification, phrasal chunk extraction and content density calculation are based on a semantic network pre-constructed for a chosen domain. To improve the readability of the extracted sentences as auto-generated summary, a shallow parsing algorithm is used to eliminate the semantic redundancy.

...read moreread less

$6\vwhpiru5hfrjqlwlrqri1dphg(qwlwlhvlq*uhhn

[...]

Sotiris Boutsis, Iason Demiros, Voula Giouli, Maria Liakata, Harris Papageorgiou, Stelios Piperidis - Show less +2 more

01 Jan 2000

TL;DR: A system that recognizes and classifies named entities (NE) in Greek text and has been developed in the framework of the EPET II “oikONOMiA” project, which aims at the construction of a pipeline integrating NE recognition, shallow parsing, and co-reference resolution technologies.

...read moreread less

Abstract: $EVWUDFW In this paper, we describe work in progress for the development of a Greek named entity recognizer. The system aims at information extraction applications where large scale text processing is needed. Speed of analysis, system robustness, and results accuracy have been the basic guidelines for the system’s design. Pattern matching techniques have been implemented on top of an existing automated pipeline for Greek text processing and the resulting system depends on non-recursive regular expressions in order to capture different types of named entities. For development and testing purposes, we collected a corpus of financial texts from several web sources and manually annotated part of it. Overall precision and recall are 86% and 81% respectively. ,QWURGXFWLRQ In this paper, we present a system that recognizes and classifies named entities (NE) in Greek text. The system has been developed in the framework of the EPET II “oikONOMiA” project, which aims at the construction of a pipeline integrating NE recognition, shallow parsing, and co-reference resolution technologies. The pipeline will analyze text to produce a shallow semantic representation suitable for template filling in scenario based information extraction (IE) applications . Natural Language Processing (NLP) systems performing information extraction have gained the focus of attention of both the academic and the business intelligence community. NERC is the first task in the information extraction task series. Several factors contribute to its complexity. Name-list based recognition is not adequate, since unknown names should be dealt with in addition to names appearing in the lists. Moreover, known names may be of several types; commonly used Greek names can be of type person, organization, location, or none of the above. Moreover, the name classification schema can vary significantly across domains and applications. Thus, there are two aspects in NERC: 1) recognition and classification of known names, and 2) spotting and classification of new names. It should be noted that the creation, adaptation, and maintenance of name databases comes at a significant cost; new text

...read moreread less

Book Chapter•DOI•

Toward an Enhancement of Textual Database Retrieval Using NLP Techniques

[...]

Asanee Kawtrakul¹, Frederic Andres², Kinji Ono², Chaiwat Ketsuwan¹, Nattakan Pengphon¹ - Show less +1 more•Institutions (2)

Kasetsart University¹, National Institute of Informatics²

28 Jun 2000

TL;DR: This paper focuses on the integration of NLP techniques for efficient textual database retrieval as part of the VLSHDS Project -Very Large Scale Hypermedia Delivery System, to increase the quality of textual information search (precision/recall) compared to already existing multi-lingual IR systems.

...read moreread less

Abstract: Improvements in hardware, communication technology and database have led to the explosion of multimedia information repositories. In order to improve the quality of information retrieval compared to already existing advanced document management systems, research works have shown that it is necessary to consider vertical integration of retrieval techniques inside database service architecture. This paper focuses on the integration of NLP techniques for efficient textual database retrieval as part of the VLSHDS Project -Very Large Scale Hypermedia Delivery System. One target of this project is to increase the quality of textual information search (precision/recall) compared to already existing multi-lingual IR systems by applying morphological analysis and shallow parsing in phrase level to document and query processing. The scope of this paper is limited to Thai documents. The underlying system is The Active HYpermedia Delivery System-(AHYDS) framework providing the delivery service over internet. Based on 1100 Thai documents, as first results, our approach improved the precision and recall from 72.666% and 56.67% in the initial implementation (without applying NLP techniques) to 85.211% and 76.876% respectively.

...read moreread less