Showing papers on "Shallow parsing published in 2001"

PDF

Open Access

Posted Content•

The Use of Classifiers in Sequential Inference

[...]

Vasin Punyakanok¹, Dan Roth¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Nov 2001-arXiv: Learning

TL;DR: This article studied the problem of combining the outcomes of several different classifiers in a way that provides a coherent inference that satisfies some constraints, and developed two general approaches for an important subproblem-identifying phrase structure.

...read moreread less

Abstract: We study the problem of combining the outcomes of several different classifiers in a way that provides a coherent inference that satisfies some constraints. In particular, we develop two general approaches for an important subproblem-identifying phrase structure. The first is a Markovian approach that extends standard HMMs to allow the use of a rich observation structure and of general classifiers to model state-observation dependencies. The second is an extension of constraint satisfaction formalisms. We develop efficient combination algorithms under both models and study them experimentally in the context of shallow parsing.

...read moreread less

204 citations

Journal Article•DOI•

Automatic extraction of acronym-meaning pairs from MEDLINE databases.

[...]

James Pustejovsky¹, José M. Castaño, Brent H. Cochran, Maciej Kotecki, Michael Morrell - Show less +1 more•Institutions (1)

Brandeis University¹

01 Jan 2001-Studies in health technology and informatics

TL;DR: A system called ACROMED that is part of a set of Information Extraction tools designed for processing and extracting information from abstracts in the Medline database is presented, found to be better for biomedical texts than the performance of other acronym extraction systems designed for unrestricted text.

...read moreread less

Abstract: Acronyms are widely used in biomedical and other technical texts. Understanding their meaning constitutes an important problem in the automatic extraction and mining of information from text. Here we present a system called ACROMED that is part of a set of Information Extraction tools designed for processing and extracting information from abstracts in the Medline database. In this paper, we present the results of two strategies for finding the long forms for acronyms in biomedical texts. These strategies differ from previous automated acronym extraction methods by being tuned to the complex phrase structures of the biomedical lexicon and by incorporating shallow parsing of the text into the acronym recognition algorithm. The performance of our system was tested with several data sets obtaining a performance of 72 % recall with 97 % precision. These results are found to be better for biomedical texts than the performance of other acronym extraction systems designed for unrestricted text.

...read moreread less

135 citations

Proceedings Article•DOI•

Exploring evidence for shallow parsing

[...]

Xin Li¹, Dan Roth¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

06 Jul 2001

TL;DR: It is concluded that directly learning to perform these tasks as shallow parsers do is advantageous over full parsers both in terms of performance and robustness to new and lower quality texts.

...read moreread less

Abstract: Significant amount of work has been devoted recently to develop learning techniques that can be used to generate partial (shallow) analysis of natural language sentences rather than a full parse In this work we set out to evaluate whether this direction is worthwhile by comparing a learned shallow parser to one of the best learned full parsers on tasks both can perform --- identifying phrases in sentences We conclude that directly learning to perform these tasks as shallow parsers do is advantageous over full parsers both in terms of performance and robustness to new and lower quality texts

...read moreread less

76 citations

Journal Article•

Managing Context in a Conversational Agent

[...]

Claude Sammut

01 Jan 2001-Electronic Transactions on Artificial Intelligence

TL;DR: The ProBot is interesting in its link to an underlying engine capable of implementing deeper reasoning, which is usually not present in conversational agents based on shallow parsing.

...read moreread less

Abstract: This paper describes a conversational agent, called “ProBot”, that uses a novel structure for handling context. The ProBot is implemented as a rule-based system embedded in a Prolog interpreter. The rules consist of patterns and responses, where each pattern matches a user’s input sentence and the response is an output sentence. Both patterns and responses may have attached Prolog expressions that act as constraints in the patterns and can invoke some action when used in the response. The main contributions of this work are in the use of hierarchies of contexts to handle unexpected inputs. The ProBot is also interesting in its link to an underlying engine capable of implementing deeper reasoning, which is usually not present in conversational agents based on shallow parsing.

...read moreread less

54 citations

Extraction and Disambiguation of Acronym Meaning-Pairs in Medline

[...]

James Pustejovsky, José M. Castaño, Brent H. Cochran, Maciej Kotecki, Michael Morrell, Anna Rumshisky - Show less +2 more

01 Jan 2001

TL;DR: In this article, the authors present a system called Acromed which finds acronym-meaning pairs as part of a set of information extraction tools designed for processing and extracting data from abstracts in the Medline database.

...read moreread less

Abstract: Acronyms are widely used in biomedical and other technical texts. Understanding their meaning constitutes an important problem in the automatic extraction and mining of information from text. Moreover, an even harder problem is sense disambiguation of acronyms; that is, where a single acronym, termed a polynym, has a multiplicity of meanings, a common occurrence in the biomedical literature. In such cases, it is necessary to identify the correct corresponding sense for the polynym, which is often not directly specified in the text. Here we present a system called Acromed which finds acronym-meaning pairs as part of a set of information extraction tools designed for processing and extracting data from abstracts in the Medline database. Our strategy for finding acronym-meaning pairs differs from previous automated acronym extraction methods by incorporating shallow parsing of the text into the acronym recognition algorithm. The performance of our system has been tested with a highly diverse set of Medline texts, giving the highest results for precision and recall, thus far in the literature. We then present Polyfind, an algorithm for disambiguating polynyms, which uses a vector space model. Our disambiguation tests produced 97.62% accuracy in one test (on acronyms) and 86.6% accuracy in another (on aliases).

...read moreread less

38 citations

Book Chapter•DOI•

Language Understanding Using Two-Level Stochastic Models with POS and Semantic Units

[...]

Ferran Pla¹, Antonio Molina¹, Emilio Sanchis¹, Encarna Segarra¹, Fernando García¹ - Show less +1 more•Institutions (1)

Polytechnic University of Valencia¹

11 Sep 2001

TL;DR: This work presents a two-level stochastic model approach to the construction of the natural language understanding component of a dialog system in the domain of database queries, which answers queries about a railway timetable in Spanish.

...read moreread less

Abstract: Over the last few years, stochastic models have been widely used in the natural language understanding modeling Almost all of these works are based on the definition of segments of words as basic semantic units for the stochastic semantic models In this work, we present a two-level stochastic model approach to the construction of the natural language understanding component of a dialog system in the domain of database queries This approach will treat this problem in a way similar to the stochastic approach for the detection of syntactic structures (Shallow Parsing or Chunking) in natural language sentences; however, in this case, stochastic semantic language models are based on the detection of some semantic units from the user turns of the dialog We give the results of the application of this approach to the construction of the understanding component of a dialog system, which answers queries about a railway timetable in Spanish

...read moreread less

19 citations

Proceedings Article•

The QUANTUM Question Answering System.

[...]

Luc Plamondon, Guy Lapalme, Leila Kosseim

01 Jan 2001

TL;DR: This work makes an extensive use of the Alembic named-entity tagger and the WordNet semantic network to extract candidate answers from one-paragraph-long passage retrieval and deals with the possibility of noanswer questions by looking for a significant score drop between the extracted candidate answers.

...read moreread less

Abstract: We participated to the TREC-X QA main task and list task with a new system named QUANTUM, which analyzes questions with shallow parsing techniques and regular expressions. Instead of using a question classification based on entity types, we classify the questions according to generic mechanisms (which we call extraction fonctions) for the extraction of candidate answers. We take advantage of the Okapi information retrieval system for one-paragraph-long passage retrieval. We make an extensive use of the Alembic named-entity tagger and the WordNet semantic network to extract candidate answers from those passages. We deal with the possibility of noanswer questions (NIL) by looking for a significant score drop between the extracted candidate answers.

...read moreread less

18 citations

Proceedings Article•

Cross-Language Information Access through Phrase Browsing

[...]

Anselmo Peñas¹, Julio Gonzalo, Felisa Verdejo•Institutions (1)

National University of Distance Education¹

28 Jun 2001

TL;DR: A cross-language retrieval system which integrates shallow parsing and lexical semantic databases in an interactive approach to information access that optimises the use of simple and robust Natural Language resources and techniques to facilitate crosslanguage information access.

...read moreread less

Abstract: This paper presents a cross-language retrieval system which integrates shallow parsing and lexical semantic databases in an interactive approach to information access. At indexing time, the system extracts a list of phrases for every language in the collection. At search time, the system bridges the gap between the user's query and the relevant phrases in the collection in any language, expanding and translating individual terms and retaining the phrases that are actually relevant in the collection. The user can access information via a standard ranked list of documents or via a hierarchy of phrasal information, in which the selection of a phrase modifies the ranked list and provides access to the documents related to the phrase. This interactive setting, to our belief, optimises the use of simple and robust Natural Language resources and techniques to facilitate crosslanguage information access.

...read moreread less

16 citations

Journal Article•DOI•

Information Extraction Strategies for Thai Documents

[...]

Rattasit Sukhahuta, Dan Smith

01 Jun 2001-International Journal of Computer Processing of Languages

TL;DR: The structure of written Thai is highly ambiguous, which requires more sophisticated techniques than are necessary to perform comparable IE tasks in most European languages, and large amounts of domain knowledge to cope with these ambiguities.

...read moreread less

Abstract: The development of an information extraction (IE) system for Thai documents raises a number of issues which are not important for IE in English and other European languages. We describe the characteristics of written Thai and the problem statements, and our approach to the Thai IE system. The structure of written Thai is highly ambiguous, which requires more sophisticated techniques than are necessary to perform comparable IE tasks in most European languages, and large amounts of domain knowledge to cope with these ambiguities. The basic characteristic of this system is to provide different natural language components to assess the surface structure of the documents. These components include word segmentation, specific lexical structure terms identification and part-of-speech tagger. Further analysis is to perform a shallow parsing based on the relevant regions that contain the specific trigger terms or patterns specified in the extraction templates. Finally, the information of interest is extracted from the grammar trees in corresponding to predefined concept definitions and returns the users with a list of answers responding to each concept.

...read moreread less

15 citations

Proceedings Article•

SHAPAQA : Shallow Parsing for Question Answering on the World Wide Web

[...]

Sabine Buchholz, Walter Daelemans

01 Jan 2001

TL;DR: This work introduces shapaqa, a shallow parsing approach to online, open-domain question answering on the WorldWideWeb that uses a memory-based shallow parser to analyze web pages retrieved using normal keyword search on a search engine.

...read moreread less

Abstract: We introduce shapaqa, a shallow parsing approach to online, open-domain question answering on the WorldWideWeb. Given a form-based natural language question as input, the system uses a memory-based shallow parser to analyze web pages retrieved using normal keyword search on a search engine. Two versions of the system are evaluated on a test set of 200 questions. In combination with two back-off methods a mean reciprocal rank of .46 is achieved.

...read moreread less

14 citations

Proceedings Article•

Oracle at TREC 10: Filtering and Question-Answering

[...]

Shamim A. Alpha, Paul R. Dixon, Ciya Liao, Changwen Yang

01 Jan 2001

TL;DR: It is found that the concepts (themes) extracted by Oracle Text can be used to aggregate document information content to simplify statistical processing.

...read moreread less

Abstract: Oracle's objective in TREC-10 was to study the behavior of Oracle information retrieval in previously unexplored application areas The software used was Oracle9i Text, Oracle's full-text retrieval engine integrated with the Oracle relational database management system, and the Oracle PL/SQL procedural programming language Runs were submitted in filtering and Q/A tracks For the filtering track we submitted three runs, in adaptive filtering, batch filtering and routing By comparing the TREC results, we found that the concepts (themes) extracted by Oracle Text can be used to aggregate document information content to simplify statistical processing Oracle's Q/A system integrated information retrieval (IR) and information extraction (IE) The QIA system relied on a combination of document and sentence ranking in IR, named entity tagging in IE and shallow parsing based classification of questions into pre-defined categories

...read moreread less

Book•

The Cico Domain-Based Parser

[...]

Vincenzo Gervasi

02 Nov 2001

TL;DR: The parsing algorithm implemented by Cico is described formally, and some experimental data on its performance is given, and a complete user manual for cico3, an implementation of the Cico algorithm, and for a number of associated tools are provided.

...read moreread less

Abstract: Domain-based parsing is a shallow parsing technique that exploits knowledge about domain-specific properties of terms in order to determine "optimal" parse trees for natural language sentences. Cico is a simple parser using domain-based parsing. It is particularly well suited for parsing natural language sentences of technical nature (e.g., requirements documents for software systems), as in this case several simplifying assumptions hold, and has been used successfully in several experiments in the requirement engineering field. In the first part of this report, we describe formally the parsing algorithm implemented by Cico, and give some experimental data on its performance. In the second part, we provide a complete user manual for cico3, an implementation of the Cico algorithm, and for a number of associated tools. Finally, in the third part, we present some illustrative example taken from real applications.

...read moreread less

Extensible Shallow Parsing for Semantic Nets

[...]

Jonathan H. Connell¹•Institutions (1)

IBM¹

01 Jan 2001

TL;DR: This paper proposes a specific linguistic-based format for semantic networks in which nodes correspond to “open class” words and morphological elements form the basis for atomic link labels and node tags.

...read moreread less

Abstract: This paper proposes a specific linguistic-based format for semantic networks in which nodes correspond to “open class” words. “Closed class” words and morphological elements form the basis for atomic link labels and node tags. A simple parser has been developed to transform written text into this representation. The properties of the resulting networks are discussed and psychologically inspired limited-horizon browsing techniques are examined.

...read moreread less

Part-of-Speech Tagging by Means of Shallow Parsing, ILP andActive Learning

[...]

Miloslav Nepil, Lubomír Popelínský, Eva Žáčková

01 Jan 2001

TL;DR: A part-of-speech tagger for Czech is described that employs DIS shallow parser for Czech, manually-coded rules and inductive logic programming.

...read moreread less

Abstract: A part-of-speech tagger for Czech is described that employs DIS shallow parser for Czech, manually-coded rules and inductive logic programming.

...read moreread less

Data-Driven Methods for PoS Tagging and Chunking of Swedish

[...]

Beáta Megyesi

01 May 2001

TL;DR: In this paper well-known state-of-the-art data-driven algorithms are applied topart- of-speech tagging and shallow parsing of Swedish texts.

...read moreread less

Abstract: In this paper well-known state-of-the-art data-driven algorithms are applied topart-of-speech tagging and shallow parsing of Swedish texts.

...read moreread less

Proceedings Article•

Phrasal Parsing by Using Data-Driven PoS Taggers

[...]

Beáta Megyesi

01 Jan 2001

TL;DR: Three data-driven algorithms are applied to shallow parsing of Swedish texts by using PoS taggers as the basis for parsing, showing that best performance can be obtained by training on the basis of PoS tags with labels marking the phrasal constituents without considering the words themselves.

...read moreread less

Abstract: Three data-driven algorithms are applied to shallow parsing of Swedish texts by using PoS taggers as the basis for parsing. The constituent structure is represented by nine types of phrases in a hierarchical structure containing labels for every constituent type the token belongs to. The results show that best performance can be obtained by training on the basis of PoS tags with labels marking the phrasal constituents without considering the words themselves. Transformation-based learning gives highest accuracy (94.44%) followed by the Maximum Entropy framework (mxpost) (92.47%) and the Hidden Markov model (TnT) (92.42%).

...read moreread less

Pour un autre traitement de la temporalité narrative

[...]

Stéphanie Girault, Caen Cedex

01 Jan 2001

TL;DR: L’objectif de cette communication n’est pas the description of ce programme, mais plutôt le point of vue du linguiste : comment détecter les discontinuités, c’EST-à-dire comment décider s’il y a complétion ou rupture.

...read moreread less

Abstract: Continuous media – stories, movies, songs – all have a basic linear structure from which cognitive processes are able to retrieve some temporal organization. How much semantic computation is necessarily involved for a proper framing of events and their transition to one another ? Can this computation be approximated with the help of simple formal clues from a shallow parsing of the story stream, and how far can it go ? Our experiments with a prototype application implement a method for segmenting written stories and splicing together "referential situations" that should belong to the same time-frame. This paper does not aim to describe the implementation but rather discusses the linguistic approach to detecting discontinuity in narrative texts, based on the principles of closure and rupture in temporal consistency.

...read moreread less

Journal Article•

Toward an enhancement of textual database retrieval using NLP techniques

[...]

Asanee Kawtrakul, Frederic Andres, Kinji Ono, Chaiwat Ketsuwan, Nattakan Pengphon - Show less +1 more

01 Jan 2001-Lecture Notes in Computer Science

TL;DR: In this paper, the authors focused on the integration of NLP techniques for efficient textual database retieval as part of the VLSHDS project -Very Large Scale Hypermedia Delivery System.

...read moreread less

Abstract: Improvements in hardware, communication technology and database have led to the explosion of multimedia information repositories. In order to improve the quality of information retrieval compared to already existing advanced document management systems, research works have shown that it is necessary to consider vertical integration of retrieval techniques inside database service architecture. This paper focuses on the integration of NLP techniques for efficient textual database retieval as part of the VLSHDS project - Very Large Scale Hypermedia Delivery System. One target of this project is to increase the quality or textual information search (precision/recall) compared to already existing multi-lingual IR systems by applying morphological analysis and shallow parsing in phrase level to document and query processing. The scope of this paper is limited to Thai documents. The underlying system is The Active HYpermedia Delivery System-(AHYDS) framework providing the delivery service over internet. Based on 1100 Thai documents, as first results, our approach improved the precision and recall from 72.666% and 56.67% in the initial implementation (without applying NLP techniques) to 85.211% and 76.876% respectively.

...read moreread less

Book Chapter•DOI•

Pyramidal Digest: An Efficient Model for Abstracting Text Databases

[...]

Wesley T. Chuang¹, Douglas Stott Parker¹•Institutions (1)

University of California, Los Angeles¹

03 Sep 2001

TL;DR: The authors proposed a pyramid digest model for composite text digesting, which combines traditional text summarization and text classification in that the digest not only serves as a summary but is also able to classify text segments of any given size and answer queries relative to a context.

...read moreread less

Abstract: We present a novel model of automated composite text digest, the Pyramidal Digest. The model integrates traditional text summarization and text classification in that the digest not only serves as a "summary" but is also able to classify text segments of any given size, and answer queries relative to a context. "Pyramidal" refers to the fact that the digest is created in at least three dimensions: scope, granularity, and scale. The Pyramidal Digest is defined recursively as a structure of extracted and abstracted features that are obtained gradually -- from specific to general, and from large to small text segment size -- through a combination of shallow parsing and machine learning algorithms. There are three noticeable threads of learning taking place: learning of characteristic relations, rhetorical relations, and lexical relations. Our model provides a principle for efficiently digesting large quantities of text: progressive learning can digest text by abstracting its significant features. This approach scales, with complexity bounded by O(n log n), where n is the size of the text. It offers a standard and systematic way of collecting as many semantic features as possible that are reachable by shallow parsing. It enables readers to query beyond keyword matches.

...read moreread less