Showing papers on "Shallow parsing published in 2002"

PDF

Open Access

Journal Article•DOI•

Robustness beyond shallowness: incremental deep parsing

[...]

Salah Ait-Mokhtar¹, Jean-Pierre Chanod¹, Claude Roux¹•Institutions (1)

01 Jun 2002-Natural Language Engineering

TL;DR: This work argues that with a systematic incremental methodology one can go beyond shallow parsing to deeper language analysis, while preserving robustness, and describes a generic system based on such a methodology and designed for building robust analyzers that tackle deeper linguistic phenomena than those traditionally handled by the now widespread shallow parsers.

...read moreread less

Abstract: Robustness is a key issue for natural language processing in general and parsing in particular, and many approaches have been explored in the last decade for the design of robust parsing systems. Among those approaches is shallow or partial parsing, which produces minimal and incomplete syntactic structures, often in an incremental way. We argue that with a systematic incremental methodology one can go beyond shallow parsing to deeper language analysis, while preserving robustness. We describe a generic system based on such a methodology and designed for building robust analyzers that tackle deeper linguistic phenomena than those traditionally handled by the now widespread shallow parsers. The rule formalism allows the recognition of n-ary linguistic relations between words or constituents on the basis of global or local structural, topological and/or lexical conditions. It offers the advantage of accepting various types of inputs, ranging from raw to chunked or constituent-marked texts, so for instance it can be used to process existing annotated corpora, or to perform a deeper analysis on the output of an existing shallow parser. It has been successfully used to build a deep functional dependency parser, as well as for the task of co-reference resolution, in a modular way.

...read moreread less

321 citations

Journal Article•DOI•

Memory-based shallow parsing

[...]

Erik Tjong Kim Sang¹•Institutions (1)

University of Antwerp¹

01 Mar 2002-Journal of Machine Learning Research

TL;DR: The authors presented memory-based learning approaches to shallow parsing and applied these to five tasks: base noun phrase identification, arbitrary base phrase recognition, clause detection, noun phrase parsing and full parsing.

...read moreread less

Abstract: We present memory-based learning approaches to shallow parsing and apply these to five tasks: base noun phrase identification, arbitrary base phrase recognition, clause detection, noun phrase parsing and full parsing. We use feature selection techniques and system combination methods for improving the performance of the memory-based learner. Our approach is evaluated on standard data sets and the results are compared with that of other systems. This reveals that our approach works well for base phrase identification while its application towards recognizing embedded structures leaves some room for improvement.

...read moreread less

137 citations

Journal Article•

Shallow parsing using specialized hmms

[...]

Antonio Molina¹, Ferran Pla¹•Institutions (1)

Polytechnic University of Valencia¹

01 Mar 2002-Journal of Machine Learning Research

TL;DR: A unified technique to solve different shallow parsing tasks as a tagging problem using a Hidden Markov Model-based approach (HMM), which constructs a Specialized HMM which gives more complete contextual models.

...read moreread less

Abstract: We present a unified technique to solve different shallow parsing tasks as a tagging problem using a Hidden Markov Model-based approach (HMM). This technique consists of the incorporation of the relevant information for each task into the models. To do this, the training corpus is transformed to take into account this information. In this way, no change is necessary for either the training or tagging process, so it allows for the use of a standard HMM approach. Taking into account this information, we construct a Specialized HMM which gives more complete contextual models. We have tested our system on chunking and clause identification tasks using different specialization criteria. The results obtained are in line with the results reported for most of the relevant state-of-the-art approaches.

...read moreread less

102 citations

Journal Article•DOI•

Lightweight validation of natural language requirements

[...]

Vincenzo Gervasi¹, Bashar Nuseibeh²•Institutions (2)

University of Pisa¹, Open University²

04 Feb 2002-Software - Practice and Experience

TL;DR: The approach to checking properties of models obtained by shallow parsing of natural language requirements, and applied to a case study based on part of a NASA specification of the Node Control Software on the International Space Station, supports the position that it is feasible and useful to perform automated analysis of requirements expressed in natural language.

...read moreread less

Abstract: In this paper, we report on our experiences of using lightweight formal methods for the partial validation of natural language requirements documents. We describe our approach to checking properties of models obtained by shallow parsing of natural language requirements, and apply it to a case study based on part of a NASA specification of the Node Control Software on the International Space Station. The experience reported supports our position that it is feasible and useful to perform automated analysis of requirements expressed in natural language. Indeed, we identified a number of errors in our case study that were also independently discovered and corrected by NASA's Independent Validation and Verification Facility in a subsequent version of the same document, and others that were not discovered. The paper describes the techniques we used, the errors we found and reflects on the lessons earned.

...read moreread less

92 citations

Journal Article•DOI•

Introduction to special issue on machine learning approaches to shallow parsing

[...]

James Hammerton¹, Miles Osborne², Susan Armstrong³, Walter Daelemans⁴•Institutions (4)

University of Groningen¹, University of Edinburgh², University of Geneva³, University of Antwerp⁴

01 Mar 2002-Journal of Machine Learning Research

TL;DR: This paper introduced the problem of partial or shallow parsing (assigning partial syntactic structure to sentences) and explained why it is an important natural language processing (NLP) task, and future directions for machine learning of shallow parsing are suggested.

...read moreread less

Abstract: This article introduces the problem of partial or shallow parsing (assigning partial syntactic structure to sentences) and explains why it is an important natural language processing (NLP) task. The complexity of the task makes Machine Learning an attractive option in comparison to the handcrafting of rules. On the other hand, because of the same task complexity, shallow parsing makes an excellent benchmark problem for evaluating machine learning algorithms. We sketch the origins of shallow parsing as a specific task for machine learning of language, and introduce the articles accepted for this special issue, a representative sample of current research in this area. Finally, future directions for machine learning of shallow parsing are suggested.

...read moreread less

69 citations

Journal Article•DOI•

On the use of prosody in automatic dialogue understanding

[...]

Elmar Nöth¹, Anton Batliner¹, Volker Warnke¹, Jürgen Haas¹, Manuela Boros, Jan Buckow¹, Richard Huber¹, Florian Gallwitz¹, M. Nutt, Heinrich Niemann¹ - Show less +6 more•Institutions (1)

University of Erlangen-Nuremberg¹

28 Jan 2002-Speech Communication

TL;DR: It is shown how prosody can be used together with other knowledge sources for the task of resegmentation if a first segmentation turns out to be wrong, and how a critical system evaluation can help to improve the overall performance of automatic dialogue systems.

...read moreread less

44 citations

Journal Article•

Shallow parsing with pos taggers and linguistic features

[...]

Beáta Megyesi¹•Institutions (1)

Royal Institute of Technology¹

01 Mar 2002-Journal of Machine Learning Research

TL;DR: Three data-driven publicly available part-of-speech taggers are applied to shallow parsing of Swedish texts, and special attention is directed to the taggers' sensitivity to different types of linguistic information included in learning, as well as their sensitivity to the size and the various types of training data sets.

...read moreread less

Abstract: Three data-driven publicly available part-of-speech taggers are applied to shallow parsing of Swedish texts. The phrase structure is represented by nine types of phrases in a hierarchical structure containing labels for every constituent type the token belongs to in the parse tree. The encoding is based on the concatenation of the phrase tags on the path from lowest to higher nodes. Various linguistic features are used in learning; the taggers are trained on the basis of lexical information only, part-of-speech only, and a combination of both, to predict the phrase structure of the tokens with or without part-of-speech. Special attention is directed to the taggers' sensitivity to different types of linguistic information included in learning, as well as the taggers' sensitivity to the size and the various types of training data sets. The method can be easily transferred to other languages.

...read moreread less

38 citations

Proceedings Article•DOI•

Shallow Parsing on the Basis of Words Only: A Case Study

[...]

Antal van den Bosch¹, Sabine Buchholz¹•Institutions (1)

Tilburg University¹

06 Jul 2002

TL;DR: It is argued that a memory-based learning algorithm might not need an explicit intermediate POS-tagging step for parsing when a sufficient amount of training material is available and word form information is used for low-frequency words.

...read moreread less

Abstract: We describe a case study in which a memory-based learning algorithm is trained to simultaneously chunk sentences and assign grammatical function tags to these chunks. We compare the algorithm's performance on this parsing task with varying training set sizes (yielding learning curves) and different input representations. In particular we compare input consisting of words only, a variant that includes word form information for low-frequency words, gold-standard POS only, and combinations of these. The word-based shallow parser displays an apparently log-linear increase in performance, and surpasses the flatter POS-based curve at about 50,000 sentences of training data. The low-frequency variant performs even better, and the combinations is best. Comparative experiments with a real POS tagger produce lower results. We argue that we might not need an explicit intermediate POS-tagging step for parsing when a sufficient amount of training material is available and word form information is used for low-frequency words.

...read moreread less

30 citations

Shallow parsing and text chunking: a view on underspecification in syntax

[...]

Stefano Federici¹, Simonetta Montemagni¹, Vito Pirrelli¹•Institutions (1)

National Research Council¹

01 Jan 2002

TL;DR: It is argued that a chunked syntactic representation can usefully be exploited as such for non trivial NLP applications which do not require full text understanding such as automatic lexical acquisition and information retrieval.

...read moreread less

Abstract: This paper illustrates a technique of shallow parsing named “text chunking” whereby “parse incompleteness” is reinterpreted as “parse underspecification”. A text is chunked into structured units which can be identified with certainty on the basis of available knowledge. The chunking process stops at that level of granularity beyond which the analysis gets undecidable. We argue that a chunked syntactic representation can usefully be exploited as such for non trivial NLP applications which do not require full text understanding such as automatic lexical acquisition and information retrieval.

...read moreread less

28 citations

Proceedings Article•

The QUANTUM Question Answering System at TREC 11.

[...]

Luc Plamondon, Guy Lapalme, Leila Kosseim

01 Jan 2002

TL;DR: Among the novelties added to QUANTUM this year is a web module that finds exact answers using high-precision reformulation of the question to anticipate the expected context of.

...read moreread less

Abstract: This year, we participated to the Question Answering task for the second time with the QUANTUM system. We entered 2 runs for the main task (one using the web, the other without) and 1 run for the list task (without the web). We essentially built on last year’s experience to enhance the system. The architecture of QUANTUM is mainly the same as last year: it uses patterns that rely on shallow parsing techniques and regular expressions to analyze the question and then select the most appropriate extraction function. This extraction function is then applied to one-paragraph long passages retrieved by Okapi to extract and score candidate answers. Among the novelties we added to QUANTUM this year is a web module that finds exact answers using high-precision reformulation of the question to anticipate the expected context of

...read moreread less

20 citations

Journal Article•

Shallow parsing using noisy and non-stationary training material

[...]

Miles Osborne¹•Institutions (1)

University of Edinburgh¹

01 Mar 2002-Journal of Machine Learning Research

TL;DR: This work investigates the performance of four shallow parsers trained using various types of artificially noisy material and shows that they are surprisingly robust to synthetic noise, and addresses the question of whether naturally occurring disfluencies undermines performance more than does a change in distribution.

...read moreread less

Abstract: Shallow parsers are usually assumed to be trained on noise-free material, drawn from the same distribution as the testing material. However, when either the training set is noisy or else drawn from a different distributions, performance may be degraded. Using the parsed Wall Street Journal, we investigate the performance of four shallow parsers (maximum entropy, memory-based learning, N-grams and ensemble learning) trained using various types of artificially noisy material. Our first set of results show that shallow parsers are surprisingly robust to synthetic noise, with performance gradually decreasing as the rate of noise increases. Further results show that no single shallow parser performs best in all noise situations. Final results show that simple, parser-specific extensions can improve noise-tolerance. Our second set of results addresses the question of whether naturally occurring disfluencies undermines performance more than does a change in distribution. Results using the parsed Switchboard corpus suggest that, although naturally occurring disfluencies might harm performance, differences in distribution between the training set and the testing set are more significant.

...read moreread less

From shallow parsing to functional structure

[...]

Rodolfo Delmonte¹•Institutions (1)

Ca' Foscari University of Venice¹

01 Jan 2002

TL;DR: This paper argues in favour of an integration between statistically and syntactically based parsing by presenting data from a study of a 500,000 word corpus of Italian, including a syntactic shallow parser and a ATN-like grammatical function assigner that automatically classifies previously manually verified tagged corpora.

...read moreread less

Abstract: In this paper we argue in favour of an integration between statistically and syntactically based parsing by presenting data from a study of a 500,000 word corpus of Italian. Most papers present approaches on tagging which are statistically based. None of the statistically based analyses, however, produce an accuracy level comparable to the one obtained by means of linguistic rules [1]. Of course their data are strictly referred to English, with the exception of [2, 3, 4]. As to Italian, we argue that purely statistically based approaches are inefficient basically due to great sparsity of tag distribution – 50% or less of unambiguous tags when punctuation is subtracted from the total count. In addition, the level of homography is also very high: readings per word are 1.7 compared to 1.07 computed for English by [2] with a similar tagset. The current work includes a syntactic shallow parser and a ATN-like grammatical function assigner that automatically classifies previously manually verified tagged corpora. In a preliminary experiment we made with automatic tagger, we obtained 99,97% accuracy in the training set and 99,03% in the test set using combined approaches: data derived from statistical tagging is well below 95% even when referred to the training set, and the same applies to syntactic tagging. As to the shallow parser, we report on a first preliminary experiment on a manually verified subset made of 10,000 words.

...read moreread less

Book Chapter•DOI•

QUANTUM: A Function-Based Question Answering System

[...]

Luc Plamondon¹, Leila Kosseim²•Institutions (2)

Université de Montréal¹, Concordia University²

27 May 2002-Lecture Notes in Computer Science

TL;DR: The QUANTUM system relies on computational linguistics as well as information retrieval techniques, and the TREC-X data set and tools are used to evaluate the overall system and each of its components.

...read moreread less

Abstract: In this paper, we describe our Question Answering (QA) system called QUANTUM. The goal of QUANTUM is to find the answer to a natural language question in a large document collection. QUANTUM relies on computational linguistics as well as information retrieval techniques. The system analyzes questions using shallow parsing techniques and regular expressions, then selects the appropriate extraction function. This extraction function is then applied to one-paragraph-long passages retrieved by the Okapi information retrieval system. The extraction process involves the Alembic named entity tagger and the WordNet semantic network to identify and score candidate answers. We designed QUANTUM according to the TREC-X QA track requirements; therefore, we use the TREC-X data set and tools to evaluate the overall system and each of its components.

...read moreread less

Journal Article•

Automatic Prediction of Chinese Phrase Boundary Location with Neural Networks

[...]

Xi Chen

01 Jan 2002-Journal of Chinese information processing

TL;DR: This paper designs and implements a method for automatic prediction of Chinese phrase boundary location based on neural network and preliminary results show that the precision is 93.24% and 92.56% respectively.

...read moreread less

Abstract: Prediction of Chinese phrase boundary location is the base of shallow parsing or chunk parsing.It is also very important for processing real texts.With the support of our Chinese treebank including 64426 words, this paper designs and implements a method for automatic prediction of Chinese phrase boundary location based on neural network. The preliminary results show that the precision is 93.24%(close testing) and 92.56%(open testing) respectively.

...read moreread less

Book Chapter•DOI•

Shallow Parsing Using Probabilistic Grammatical Inference

[...]

Franck Thollard¹, Franck Thollard², Alexander Clark¹, Alexander Clark²•Institutions (2)

Jean Monnet University¹, University of Geneva²

23 Sep 2002

TL;DR: An application of grammatical inference to the task of shallow parsing by learning a deterministic probabilistic automaton that models the joint distribution of Chunk (syntactic phrase) tags and Part-of-speech tags, and using this automaton as a transducer to find the most likely chunk tag sequence using a dynamic programming algorithm.

...read moreread less

Abstract: This paper presents an application of grammatical inference to the task of shallow parsing. We first learn a deterministic probabilistic automaton that models the joint distribution of Chunk (syntactic phrase) tags and Part-of-speech tags, and then use this automaton as a transducer to find the most likely chunk tag sequence using a dynamic programming algorithm. We discuss an efficient means of incorporating lexical information, which automatically identifies particular words that are useful using a mutual information criterion, together with an application of bagging that improve our results. Though the results are not as high as comparable techniques that use models with a fixed structure, the models we learn are very compact and efficient.

...read moreread less

Hidden Markov model-based Supertagging in a user-initiative dialogue system

[...]

Jens Bäcker¹, Karin Harbusch•Institutions (1)

University of Koblenz and Landau¹

01 May 2002

TL;DR: The advantages of deploying a shallow parser based on Supertagging in an automatic dialogue system in a call center that basically leaves the initiative with the user as far as (s)he wants are outlined.

...read moreread less

Abstract: In this paper we outline the advantages of deploying a shallow parser based on Supertagging in an automatic dialogue system in a call center that basically leaves the initiative with the user as far as (s)he wants (in the literature called user‐ initiative or adaptive in contrast to system‐initiativedialogue systems). The Supertagger relies on a Hidden Markov model and is trained with German input texts. The entire design of a Hidden Markov‐based Supertagger with trigrams builds the central issue of this paper. The evaluation of our German Supertagger lags behind the English one. Some of the reasons will be addressed later on. Nevertheless shallow parsing with the Supertags increases the accuracy compared to a basic version of KoHDaS that only relies on recurrent plausibility networks.

...read moreread less

Posted Content•

Memory-Based Shallow Parsing

[...]

Erik Tjong Kim Sang¹•Institutions (1)

University of Antwerp¹

24 Apr 2002-arXiv: Computation and Language

TL;DR: This work presents memory-based learning approaches to shallow parsing and applies these to five tasks: base noun phrase identification, arbitrary base phrase recognition, clause detection, noun phrase parsing and full parsing.

...read moreread less

Proceedings Article•

Integration of supra-lexical linguistic models with speech recognition using shallow parsing and finite state transducers.

[...]

Xiaolong Mou, Stephanie Seneff, Victor W. Zue

01 Jan 2002

TL;DR: This paper proposes a layered Finite State Transducer (FST) framework integrating hierarchical supra-lexical linguistic knowledge into speech recognition based on shallow parsing, and shows that with a higher order top-level n-gram model, pre-composition and optimization of the FSTs are highly restricted by the computational resources available.

...read moreread less

Abstract: This paper proposes a layered Finite State Transducer (FST) framework integrating hierarchical supra-lexical linguistic knowledge into speech recognition based on shallow parsing. The shallow parsing grammar is derived directly from the full fledged grammar for natural language understanding, and augmented with top-level n-gram probabilities and phrase-level context-dependent probabilities, which is beyond the standard context-free grammar (CFG) formalism. Such a shallow parsing approach can help balance sufficient grammar coverage and tight structure constraints. The context-dependent probabilistic shallow parsing model is represented by layered FSTs, which can be integrated with speech recognition seamlessly to impose early phrase-level structural constraints consistent with natural language understanding. It is shown that in the JUPITER [1] weather information domain, the shallow parsing model achieves lower recognition word error rates, compared to a regular class n-gram model with the same order. However, we find that, with a higher order top-level n-gram model, pre-composition and optimization of the FSTs are highly restricted by the computational resources available. Given the potential of such models, it may be worth pursing an incremental approximation strategy [2], which includes part of the linguistic model FST in early optimization, while introducing the complete model through dynamic composition.

...read moreread less

Shallow-Parsing Stylebook for German

[...]

Frank Henrik Müller¹•Institutions (1)

University of Tübingen¹

01 Jan 2002

TL;DR: This stylebook gives an overview of the various categories annotated in those different layers of chunks, topological fields and clauses in the presented system, and the methodology of the annotation process is mentioned.

...read moreread less

Abstract: The presented system provides a shallow syntactic annotation for unrestricted German text. It requires POS-annotated text and annotates the layers of chunks, topological fields and clauses. This stylebook gives an overview of the various categories annotated in those different layers. The methodology of the annotation process is mentioned in those cases where it has an impact on the annotation scheme. Example sentences are taken from real language data, but were simplified where necessary.

...read moreread less

Proceedings Article•

An evaluation of different symbolic shallow parsing techniques.

[...]

Tristan van Rullen, Philippe Blache¹•Institutions (1)

Aix-Marseille University¹

01 May 2002

TL;DR: This paper presents an evaluation of four shallow parsers and attempts to demonstrate the interest of observing the ‘common boundaries’ produced by different parsers as good indices for the evaluation of these algorithms.

...read moreread less

Abstract: This paper presents an evaluation of four shallow parsers The interest of each of these parsers led us to imagine a parameterized multiplexer for syntactic information based on the principle of merging the common boundaries of the outputs given by each of these programs. The question of evaluating the parsers as well as the multiplexer came in the foreground with the problem of not owning reference corpora. We attempt here to demonstrate the interest of observing the ‘common boundaries’ produced by different parsers as good indices for the evaluation of these algorithms. Such an evaluation is proposed and tested with a set of two experiences.

...read moreread less

Proceedings Article•DOI•

Context-sensitive electronic dictionaries

[...]

Gábor Prószéky, Balázs Kis

24 Aug 2002

TL;DR: A context-sensitive electronic dictionary that provides translations for any piece of text displayed on a computer screen, without requiring user interaction is introduced through a process of three phases: text acquisition from the screen, morpho-syntactic analysis of the context of the selected word, and the dictionary lookup.

...read moreread less

Abstract: This paper introduces a context-sensitive electronic dictionary that provides translations for any piece of text displayed on a computer screen, without requiring user interaction. This is achieved through a process of three phases: text acquisition from the screen, morpho-syntactic analysis of the context of the selected word, and the dictionary lookup. As with other similar tools available, this program usually works with dictionaries adapted from one or more printed dictionaries. To implement context sensitive features, however, traditional dictionary entries need to be restructured. By splitting up entries into smaller pieces and indexing them in a special way, the program is able to display a restricted set of information that is relevant to the context. Based on the information in the dictionaries, the program is able to recognize---even discontinuous---multiword expressions on the screen.The program has three major features which we believe make it unique for the time being, and which the development focused on: linguistic flexibility (stemming, morphological analysis and shallow parsing), open architecture (three major architectural blocks, all replaceable along public documented APIs), and flexible user interface (replaceable dictionaries, direct user feedback).In this paper, we assess the functional requirements of a context-sensitive dictionary as a start; then we explain the program's three phases of operation, focusing on the implementation of the lexicons and the context-sensitive features. We conclude the paper by comparing our tool to other similar publicly available products, and summarize plans for future development.

...read moreread less

Representing Textual Content in a Generic Extraction Model

[...]

Nancy McCracken¹•Institutions (1)

Syracuse University¹

01 Jan 2002

TL;DR: A text processing system that uses shallow parsing techniques to extract information from sentences in text documents and stores frames of information in a knowledge base that is approaching more complete text understanding in a practical way that does not require expensive processing such as full parsing of the documents.

...read moreread less

Abstract: The system described in this paper automatically extracts and stores information from documents. We have implemented a text processing system that uses shallow parsing techniques to extract information from sentences in text documents and stores frames of information in a knowledge base. We intend to use this system in two main application areas: open domain Question & Answering (Q&A) and specific domain information extraction. Extraction from Documents The system described in this paper uses a Natural Language Processing system developed at the Center for Natural Language Processing to extract information from documents and store it in a knowledge base. In the past, applications were aimed at MUC-style information extraction that filled in templates of specific types of information. Our current goal is to produce a system that can extract generic frames of information about all entities and events in the sentences of the text and represent relationships between them. This type of system is approaching more complete text understanding in a practical way that does not require expensive processing such as full parsing of the documents. The heart of the generic extraction system is a set of rules written for a finite-state system that recognizes the patterns of text. These rules are applied in several phases including part-of-speech tagging, bracketing of noun phrases, and categorization of proper noun phrases. Later phases recognize the surface structure of phrases in each sentence and map the phrases to the case frame of the verbs, recognizing the phrases taking the roles of agent, object, point-in-time, etc., and creating a frame representing an “event”. The case roles are similar to those in case grammars (Fillmore 1968). Consider the example sentence: In addition to these most recent incidents, the Abu Sayyaf have bought Russian uranium on Basilan Island.

...read moreread less

Journal Article•

Una formulación unificada para resolver distintos problemas de ambigüedad en PLN

[...]

Antonio Molina Marco, Ferrán Pla Santamaría, Encarnación Segarra Soriano

01 Sep 2002-Procesamiento Del Lenguaje Natural

TL;DR: Este trabajo ha sido parcialmente subvencionado por los proyectos CICYT TIC 2000-0664-C02-01 y TIC2000-1599-C01-01.

...read moreread less

Abstract: Este trabajo ha sido parcialmente subvencionado por los proyectos CICYT TIC2000-0664-C02-01 y TIC2000-1599-C01-01.

...read moreread less

DOI•

An FPGA-based syntactic parser for large size real-life context-free grammars

[...]

Cristian Raul Ciressan

01 Jan 2002

TL;DR: An efficient FPGA-based coprocessor for natural language syntactic analysis that can deal with inputs in the form of word lattices is proposed and an interface between the hardware tool and a potential natural language software application, running on the desktop computer is offered.

...read moreread less

Abstract: This thesis is at the crossroad between Natural Language Processing (NLP) and digital circuit design. It aims at delivering a custom hardware coprocessor for accelerating natural language parsing. The coprocessor has to parse real-life natural language and is targeted to be useful in several NLP applications that are time constrained or need to process large amounts of data. More precisely, the three goals of this thesis are: (1) to propose an efficient FPGA-based coprocessor for natural language syntactic analysis that can deal with inputs in the form of word lattices, (2) to implement the coprocessor in a hardware tool ready for integration within an ordinary desktop computer and (3) to offer an interface (i.e. software library) between the hardware tool and a potential natural language software application, running on the desktop computer. The Field Programmable Gate Array (FPGA) technology has been chosen as the core of the coprocessor implementation due to its ability to efficiently exploit all levels of parallelism available in the implemented algorithms in a cost-effective solution. In addition, the FPGA technology makes it possible to efficiently design and test such a hardware coprocessor. A final reason is that the future general-purpose processors are expected to contain reconfigurable resources. In such a context, an IP core implementing an efficient context-free parser ready to be configured within the reconfigurable resources of the general-purpose processor would be a support for any application relying on context-free parsing and running on that general-purpose processor. The context-free grammar parsing algorithms that have been implemented are the standard CYK algorithm and an enhanced version of the CYK algorithm developed at the EPFL Artificial Intelligence Laboratory. These algorithms were selected (1) due to their intrinsic properties of regular data flow and data processing that make them well suited for a hardware implementation, (2) for their property of producing partial parse trees which makes them adapted for further shallow parsing and (3) for being able to parse word lattices.

...read moreread less

Proceedings Article•DOI•

Robust interpretation of user requests for text retrieval in a multimodal environment

[...]

Alexandra Klein¹, Estela Puig-Waldmüller¹, Harald Trost²•Institutions (2)

Austrian Research Institute for Artificial Intelligence¹, University of Vienna²

24 Aug 2002

TL;DR: A parser for robust and flexible interpretation of user utterances in a multi-modal system for web search in newspaper databases that integrates shallow parsing techniques with knowledge-based text retrieval to allow for robust processing and coordination of input modes.

...read moreread less

Abstract: We describe a parser for robust and flexible interpretation of user utterances in a multi-modal system for web search in newspaper databases. Users can speak or type, and they can navigate and follow links using mouse clicks. Spoken or written queries may combine search expressions with browser commands and search space restrictions. In interpreting input queries, the system has to be fault-tolerant to account for spontanous speech phenomena as well as typing or speech recognition errors which often distort the meaning of the utterance and are difficult to detect and correct. Our parser integrates shallow parsing techniques with knowledge-based text retrieval to allow for robust processing and coordination of input modes. Parsing relies on a two-layered approach: typical meta-expressions like those concerning search, newspaper types and dates are identified and excluded from the search string to be sent to the search engine. The search terms which are left after preprocessing are then grouped according to co-occurrence statistics which have been derived from a newspaper corpus. These co-occurrence statistics concern typical noun phrases as they appear in newspaper texts.

...read moreread less

Journal Article•DOI•

Shallow parsing using specialized hmms

[...]

MolinaAntonio, PlaFerran

01 Mar 2002-Journal of Machine Learning Research

TL;DR: A unified technique to solve different shallow parsing tasks as a tagging problem using a Hidden Markov Model-based approach (HMM), consisting of the incorporation of the HMM into the parser itself.

...read moreread less

Proceedings Article•DOI•

Verbalyse: vers une automatisation du traitement des verbalisations issues de tests utilisateurs

[...]

Laurent Joblet¹•Institutions (1)

University of Provence¹

26 Nov 2002

TL;DR: In this article, the authors define criterions and a method to assess the natural level of MMI, based on verbal reports of users, and design a demonstration-software to automatically process the verbal reports.

...read moreread less

Abstract: This study is on the natural MMI concept. The purpose of this work is twice. First, to define criterions and a method to assess the MMI. This method, based on verbal reports, have to measure the natural level of MMI. Second, to design a demonstration-software to assess the natural level of MMI. The demonstration-software automatically processes the verbal reports of users.

...read moreread less