Showing papers on "Shallow parsing published in 2008"

PDF

Open Access

Journal Article•DOI•

A stopping criterion for active learning

[...]

01 Jul 2008-Computer Speech & Language

TL;DR: This work presents a stopping criterion for active learning based on the way instances are selected during uncertainty-based sampling and verifies its applicability in a variety of settings.

...read moreread less

143 citations

Journal Article•DOI•

A web-based Bengali news corpus for named entity recognition

[...]

Asif Ekbal¹, Sivaji Bandyopadhyay¹•Institutions (1)

Jadavpur University¹

22 Feb 2008

TL;DR: A tagged Bengali news corpus has been developed from the web archive of a widely read Bengali newspaper and named Entity Recognition systems based on pattern based shallow parsing with or without using linguistic knowledge have been developed using a part of this corpus.

...read moreread less

Abstract: The rapid development of language resources and tools using machine learning techniques for less computerized languages requires appropriately tagged corpus. A tagged Bengali news corpus has been developed from the web archive of a widely read Bengali newspaper. A web crawler retrieves the web pages in Hyper Text Markup Language (HTML) format from the news archive. At present, the corpus contains approximately 34 million wordforms. Named Entity Recognition (NER) systems based on pattern based shallow parsing with or without using linguistic knowledge have been developed using a part of this corpus. The NER system that uses linguistic knowledge has performed better yielding highest F-Score values of 75.40%, 72.30%, 71.37%, and 70.13% for person, location, organization, and miscellaneous names, respectively.

...read moreread less

73 citations

Proceedings Article•DOI•

Modeling Latent-Dynamic in Shallow Parsing: A Latent Conditional Model with Imrpoved Inference

[...]

Xu Sun¹, Louis-Philippe Morency², Daisuke Okanohara¹, Yoshimasa Tsuruoka³, Jun'ichi Tsujii³ - Show less +1 more•Institutions (3)

University of Tokyo¹, Institute for Creative Technologies², University of Manchester³

18 Aug 2008

TL;DR: This paper proposes the Best Label Path (BLP) inference algorithm, which is able to produce the most probable label sequence on latent conditional models, and outperforms two existing inference algorithms.

...read moreread less

Abstract: Shallow parsing is one of many NLP tasks that can be reduced to a sequence labeling problem. In this paper we show that the latent-dynamics (i.e., hidden substructure of shallow phrases) constitutes a problem in shallow parsing, and we show that modeling this intermediate structure is useful. By analyzing the automatically learned hidden states, we show how the latent conditional model explicitly learn latent-dynamics. We propose in this paper the Best Label Path (BLP) inference algorithm, which is able to produce the most probable label sequence on latent conditional models. It outperforms two existing inference algorithms. With the BLP inference, the LDCRF model significantly outperforms CRF models on word features, and achieves comparable performance of the most successful shallow parsers on the CoNLL data when further using part-of-speech features.

...read moreread less

63 citations

Hybrid, Three-stage Named Entity Recognizer for Tamil

[...]

S. Lakshmana Pandian, Krishnan Aravind Pavithra¹•Institutions (1)

Anna University¹

01 Jan 2008

TL;DR: The aim of this paper is to present the construction of a hybrid, three-stage named entity recognizer for Tamil that performs an in-place tagging task for a given Tamil document in three phases namely shallow parsing, shallow semantic parsing and statistical processing.

...read moreread less

Abstract: The aim of this paper is to present the construction of a hybrid, three-stage named entity recognizer for Tamil. Named entity recognition performs an in-place tagging task for a given Tamil document in three phases namely shallow parsing, shallow semantic parsing and statistical processing. The E-M algorithm (HMM) is used in the statistical processing phase, with initial probabilities obtained from the shallow parsing phase, and a modification to the E-M algorithm deals with inputs from the shallow semantic parsing phase. This study is concentrated on entity names (personal names, location names and organization names), temporal expressions (dates and times) and number expressions. Both NER tags and POS tags are used as the hidden variables in the E-M algorithm. The average Fvalues obtained from the system 72.72% for the various entity types.

...read moreread less

20 citations

Journal Article•

Natural Language Interface Using Shallow Parsing.

[...]

Rajendra Akerkar, Manish Joshi

01 Jan 2008-International Journal of Computer Science & Applications

TL;DR: Experimental results show that this approach can analyze a wide range of questions with high accuracy and produce reasonable textual responses and advantages of a novel Natural Language Interface comprising of shallow parsing based algorithms in conjunction with some intelligent techniques to train the system.

...read moreread less

Abstract: This paper deals with a natural language interface, which accepts natural language questions as inputs and generates textual responses. In natural language processing, key-word matching based paradigm generate answers, however these answers frequently affected by certain language dependant phenomena such as semantic symmetry and ambiguous modification. Available techniques, described in the literature, deal with these problems using in depth parsing. In this paper, we will present rules to tackle linguistic phenomena using shallow parsing and discuss advantages of a novel Natural Language Interface comprising of shallow parsing based algorithms in conjunction with some intelligent techniques to train the system. Experimental results show that this approach can analyze a wide range of questions with high accuracy and produce reasonable textual responses.

...read moreread less

15 citations

Journal Article•DOI•

Extraction of complex index terms in non-English IR: A shallow parsing based approach

[...]

Jesús Vilares¹, Miguel A. Alonso¹, Manuel Vilares²•Institutions (2)

University of A Coruña¹, University of Vigo²

01 Jul 2008-Information Processing and Management

TL;DR: This article proposes the use of syntactic dependencies as complex index terms in an attempt to solve the problems deriving from both syntactic and morpho-syntactic variation and, in this way, to obtain more precise index terms.

...read moreread less

Abstract: The performance of information retrieval systems is limited by the linguistic variation present in natural language texts. Word-level natural language processing techniques have been shown to be useful in reducing this variation. In this article, we summarize our work on the extension of these techniques for dealing with phrase-level variation in European languages, taking Spanish as a case in point. We propose the use of syntactic dependencies as complex index terms in an attempt to solve the problems deriving from both syntactic and morpho-syntactic variation and, in this way, to obtain more precise index terms. Such dependencies are obtained through a shallow parser based on cascades of finite-state transducers in order to reduce as far as possible the overhead due to this parsing process. The use of different sources of syntactic information, queries or documents, has been also studied, as has the restriction of the dependencies applied to those obtained from noun phrases. Our approaches have been tested using the CLEF corpus, obtaining consistent improvements with regard to classical word-level non-linguistic techniques. Results show, on the one hand, that syntactic information extracted from documents is more useful than that from queries. On the other hand, it has been demonstrated that by restricting dependencies to those corresponding to noun phrases, important reductions of storage and management costs can be achieved, albeit at the expense of a slight reduction in performance.

...read moreread less

13 citations

Journal Article•

Algorithms to Improve Performance of Natural Language Interface.

[...]

Manish Joshi, R. A. Akerkar

01 Jan 2008-International Journal of Computer Science & Applications

TL;DR: Proposed shallow parsing based algorithms reduce the amount of syntactic processing required to deal with problems caused by semantic symmetry and ambiguous modification and improve the precision of Natural Language Interface.

...read moreread less

Abstract: Performance of Natural Language Interface often deteriorates due to linguistic phenomena of Semantic Symmetry and Ambiguous Modification (Katz and Lin, 2003). In this paper we present algorithms to handle problems caused by semantic symmetry and ambiguous modification. Use of these algorithms has improved the precision of Natural Language Interface. Proposed shallow parsing based algorithms reduce the amount of syntactic processing required to deal with problems caused by semantic symmetry and ambiguous modification. These algorithms need only POS (Part of Speech) information that is generated by shallow parsing of corpus text. Results are compared with the results of basic Natural Language Interface without such algorithm. Dealing with linguistic phenomena using shallow parsing is a novel approach as we overcome the usual brittleness ass ociated with in depth parsing. We also present computational results that produced comparative charts based on answers extracted for a same query posed to these two systems.

...read moreread less

13 citations

Journal Article•

Thematic Role Extraction Using Shallow Parsing

[...]

Mehrnoush Shamsfard, Maryam Sadr Mousavi

29 Jun 2008-International Journal of Electrical and Computer Engineering

TL;DR: This paper developed a rule based shallow parser to chunk Persian sentences and developed a knowledge-based system to assign 16 selected thematic roles to the chunks to extract semantic roles from Persian sentences.

...read moreread less

Abstract: Extracting thematic (semantic) roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a rule-based approach to extract semantic roles from Persian sentences. The system exploits a twophase architecture to (1) identify the arguments and (2) label them for each predicate. For the first phase we developed a rule based shallow parser to chunk Persian sentences and for the second phase we developed a knowledge-based system to assign 16 selected thematic roles to the chunks. The experimental results of testing each phase are shown at the end of the paper. Keywords—Natural Language Processing, Semantic Role Labeling, Shallow parsing, Thematic Roles.

...read moreread less

13 citations

Proceedings Article•

Demo: An Open Source Tool for Partial Parsing and Morphosyntactic Disambiguation

[...]

Aleksander Buczyński, Adam Przepiórkowski

01 May 2008

TL;DR: Spejd (abbreviated to ) is based on a fully uniform formalism both for constituency partial parsing and for morphosyntactic disambiguation, which is more flexible than either the usual shallow parsing formalisms or the usual unification-based formalisms.

...read moreread less

Abstract: The paper presents Spejd, an Open Source Shallow Parsing and Disambiguation Engine. Spejd (abbreviated to ) is based on a fully uniform formalism both for constituency partial parsing and for morphosyntactic disambiguation — the same grammar rule may contain structure-building operations, as well as morphosyntactic correction and disambiguation operations. The formalism and the engine are more flexible than either the usual shallow parsing formalisms, which assume disambiguated input, or the usual unification-based formalisms, which couple disambiguation (via unification) with structure building. Current applications of Spejd include rule-based disambiguation, detection of multiword expressions, valence acquisition, and sentiment analysis. The functionality can be further extended by adding external lexical resources. While the examples are based on the set of rules prepared for the parsing of the IPI PAN Corpus of Polish, is fully language-independent and we hope it will also be useful in the processing of other languages.

...read moreread less

10 citations

Compression entropique de phrases contrôlée par un perceptron

[...]

Thierry Waszak, Juan-Manuel Torres-Moreno

01 Jan 2008

TL;DR: At the core of the system is a language model based on lemma bigrams and part-of-speech tags as well as an entropy computation over sentences to retrieve the best-compressed sentences.

...read moreread less

Abstract: Sentence compression is a necessary component to the generation of abstracts. Previous studies focused mainly on the syntactic tree representation of the sentence. Our approach is a statistic approach, which does not use syntactic trees, which can be inaccurate in sentence analysis. At the core of our system is a language model based on lemma bigrams and part-of-speech tags (only a shallow parsing is performed) as well as an entropy computation over sentences to retrieve the best-compressed sentences. We also introduce the perceptron which is used to classify the compressed and non-compressed sentences and to indicate whether or not a sentence should be compressed.

...read moreread less

8 citations

Posted Content•

The Prolog Interface to the Unstructured Information Management Architecture

[...]

Paul Fodor, Adam Lally, David A. Ferrucci

03 Sep 2008-arXiv: Software Engineering

TL;DR: The design and implementation of the Prolog interface to the Unstructured Information Management Architecture (UIMA) and some of its applications in natural language processing are described.

...read moreread less

Abstract: In this paper we describe the design and implementation of the Prolog interface to the Unstructured Information Management Architecture (UIMA) and some of its applications in natural language processing. The UIMA Prolog interface translates unstructured data and the UIMA Common Analysis Structure (CAS) into a Prolog knowledge base, over which, the developers write rules and use resolution theorem proving to search and generate new annotations over the unstructured data. These rules can explore all the previous UIMA annotations (such as, the syntactic structure, parsing statistics) and external Prolog knowledge bases (such as, Prolog WordNet and Extended WordNet) to implement a variety of tasks for the natural language analysis. We also describe applications of this logic programming interface in question analysis (such as, focus detection, answer-type and other constraints detection), shallow parsing (such as, relations in the syntactic structure), and answer selection.

...read moreread less

Automated classification of product review sentiments in Polish

[...]

Aleksander Wawer

01 Jan 2008

TL;DR: Results of a comparison of pure “Bag of Words” approach against a mixed method, extended by detecting opinion patterns using shallow-parsing techniques are presented.

...read moreread less

Abstract: Automated sentiment polarity prediction from text is a challenging problem addressed in this paper. We present results of a comparison of pure “Bag of Words” approach against a mixed method, extended by detecting opinion patterns using shallow-parsing techniques. We utilize two resources for the analysis: Spejd shallow parsing engine and Zetema dictionary of sentiment in Polish. The performance of both approaches has been evaluated on online product review database.

...read moreread less

Proceedings Article•DOI•

Shallow Parsing Based on Maximum Matching Method and Scoring Model

[...]

Maosheng Zhong¹, Lei Liu¹, Ruzhan Lu¹•Institutions (1)

Shanghai Jiao Tong University¹

18 Jun 2008

TL;DR: The result shows that although the method did not apply any syntactic rules, the BPS algorithm, which combined the MM with SM algorithm, exerted the strong point of the MM andSM algorithm, obtained a favorable performance.

...read moreread less

Abstract: Shallow parsing is a very important task in natural language processing or text mining, and the partial syntactical information can help to solve many other natural language processing tasks. In this paper, we split the task of shallow parsing into two subtasks: (1) Seeking all the break points to divide a part-of-speech (POS) sequence into some groups; (2) Tagging a phrase type for each POS group. In the first, we present the break point seeking (BPS) algorithm,which is combination of scoring model (SM) and maximum matching method (MM), to solve the first subtask. Then,we used the Bayes classifier to tag the phrase structure type for each POS group. The result shows that although our method did not apply any syntactic rules, the BPS algorithm, which combined the MM with SM algorithm, exerted the strong point of the MM and SM algorithm, obtained a favorable performance.

...read moreread less

Proceedings Article•DOI•

Zero pronoun resolution in Chinese using machine learning plus shallow parsing

[...]

Weijie Yang¹, Ruwei Dai¹, Xia Cui¹•Institutions (1)

Chinese Academy of Sciences¹

20 Jun 2008

TL;DR: This paper proposed a new method to detect and resolve zero pronouns in Chinese text with integrated automatic main verbs identification, verbal logic valence and machine learning approach and demonstrated this zero pronouns identifying and resolving method works effectively.

...read moreread less

Abstract: This paper proposed a new method to detect and resolve zero pronouns in Chinese text with integrated automatic main verbs identification, verbal logic valence and machine learning approach. Zero pronoun recognition was treated as the problem of finding missing verbs logic arguments. First, based on automatic main verbs identification, syntax hierarchies were analysed. Second, combining the syntax hierarchy and verbal logic valence theory, zero pronouns were identified. And then using a machine learning approach, zero pronouns were resolved. Experimental results on 150 news articles indicated that the precision and recall of zero pronoun detection is 72.9% and 92.7% respectively, and the accuracy of antecedent estimation is 64.3%. . These results demonstrated this zero pronouns identifying and resolving method works effectively.

...read moreread less

Proceedings Article•

A Hybrid Machine Translation System for Typologically Related Languages

[...]

Petr Homola¹, Vladislav Kubon¹•Institutions (1)

Charles University in Prague¹

01 Jan 2008

TL;DR: A shallow parsing formalism aiming at machine translation between closely related languages by allowing to write grammar rules helping to (partially) disambiguate chunks in input sentences.

...read moreread less

Abstract: This paper describes a shallow parsing formalism aiming at machine translation between closely related languages. The formalism allows to write grammar rules helping to (partially) disambiguate chunks in input sentences. The chunks are then translatred into the target language without any deep syntactic or semantic processing. A stochastic ranker then selects the best translation according to the target language model. The results obtained for Czech and Slovak are presented.

...read moreread less

Journal Issue•DOI•

Question-driven segmentation of lecture speech text: Towards intelligent e-learning systems

[...]

Ming Lin¹, Zhu Zhang¹•Institutions (1)

University of Arizona¹

15 Jan 2008-Journal of the Association for Information Science and Technology

TL;DR: In the proposed approach, shallow parsing such as part of-speech tagging and noun phrase chunking are used to parse both questions and Automated Speech Recognition (ASR) transcripts, and a sliding-window algorithm is proposed to identify the start and ending boundaries of returned segments.

...read moreread less

Abstract: Recently, lecture videos have been widely used in e-learning systems. Envisioning intelligent e-learning systems, this article addresses the challenge of information seeking in lecture videos by retrieving relevant video segments based on user queries, through dynamic segmentation of lecture speech text. In the proposed approach, shallow parsing such as part of-speech tagging and noun phrase chunking are used to parse both questions and Automated Speech Recognition (ASR) transcripts. A sliding-window algorithm is proposed to identify the start and ending boundaries of returned segments. Phonetic and partial matching is utilized to correct the errors from automated speech recognition and noun phrase chunking. Furthermore, extra knowledge such as lecture slides is used to facilitate the ASR transcript error correction. The approach also makes use of proximity to approximate the deep parsing and structure match between question and sentences in ASR transcripts. The experimental results showed that both phonetic and partial matching improved the segmentation performance, slides-based ASR transcript correction improves information coverage, and proximity is also effective in improving the overall performance. © 2008 Wiley Periodicals, Inc.

...read moreread less

Proceedings Article•DOI•

An Approach of Chunk Parsing and Entity Relation Extracting to Chinese Based on Conditional Random Fields Model

[...]

Jun-hua Wu¹, Jing Zhou¹•Institutions (1)

Nanjing Tech University¹

26 Nov 2008

TL;DR: This paper defines the representation of Chinese chunk and entity relation and obtains an optimized CRFs model that can realize label to chunk and entities relation so as to complete chunk parsing and relation extracting.

...read moreread less

Abstract: Conditional random fields (CRFs) model is the valid probabilistic model to segment and label sequence data. Comparing with other statistical models, such as HMM, MEHMM, CRFs process the data sequence in terms of the context of data. Chunk analysis is a shallow parsing method to simplify natural language processing. And entity relation extraction is used in establishing relationship between entities. Because full syntax parsing is complexity in Chinese text understanding chunk analysis and relation extraction is important. This paper models these problems to Chinese text. By transforming them into label solution we can use CRFs to realize the chunk analysis and entities relation extraction. In the paper we define the representation of Chinese chunk and entity relation. The features window of the label word is discussed. By training we obtain an optimized CRFs model. It can realize label to chunk and entity relation so as to complete chunk parsing and relation extracting.

...read moreread less

Proceedings Article•DOI•

Chinese Sentence Similarity Measure Based on Words and Structure Information

[...]

Rongbo Wang¹, Xiaohua Wang¹, Zheru Chi², Zhiqun Chen²•Institutions (2)

Hangzhou Dianzi University¹, Hong Kong Polytechnic University²

23 Jul 2008

TL;DR: The new method presented yields a good efficiency and effectiveness without conducting a complex and deep syntactic analysis of Chinese sentences, which can be applied to an EBMT system for CSSM for a better performance in Chinese-to-English translation.

...read moreread less

Abstract: Example-based machine translation (EBMT) is an important branch of machine translation. Sentence similarity measure is certainly one of the most significant problems addressed in EBMT. For EBMT from Chinese to English, the performance of similarity measure of Chinese sentences greatly affects the final translation result of an input Chinese sentence. In this paper, we present an approach to Chinese sentence similarity measure (CSSM) together with word sequence and sentence structure information. The new method in our experiments yields a good efficiency and effectiveness without conducting a complex and deep syntactic analysis of Chinese sentences, which can be applied to an EBMT system for CSSM for a better performance in Chinese-to-English translation.

...read moreread less

Proceedings Article•

Learning to identify reduced passive verb phrases with a shallow parser

[...]

Sean Igo¹, Ellen Riloff¹•Institutions (1)

University of Utah¹

13 Jul 2008

TL;DR: A learned classifier is presented that can accurately identify reduced passive voice constructions in shallow parsing environments and directly impact thematic role recognition and NLP applications that depend on it.

...read moreread less

Abstract: Our research is motivated by the observation that NLP systems frequently mislabel passive voice verb phrases as being in the active voice when there is no auxiliary verb (e.g., "The man arrested had a long record"). These errors directly impact thematic role recognition and NLP applications that depend on it. We present a learned classifier that can accurately identify reduced passive voice constructions in shallow parsing environments.

...read moreread less

Proceedings Article•

UCSG: A Wide Coverage Shallow Parsing System

[...]

Guntur Bharadwaja Kumar, Kavi Narayana Murthy

01 Jan 2008

TL;DR: An architecture for building wide coverage shallow parsers by using a judicious combination of linguistic and statistical techniques without need for large amount of parsed training corpus without compromising on the ability to produce all possible parses in principle is proposed.

...read moreread less

Abstract: In this paper, we propose an architecture, called UCSG Shallow Parsing Architecture, for building wide coverage shallow parsers by using a judicious combination of linguistic and statistical techniques without need for large amount of parsed training corpus to start with. We only need a large POS tagged corpus. A parsed corpus can be developed using the architecture with minimal manual effort, and such a corpus can be used for evaluation as also for performance improvement. The UCSG architecture is designed to be extended into a full parsing system but the current work is limited to chunking and obtaining appropriate chunk sequences for a given sentence. In the UCSG architecture, a Finite State Grammar is designed to accept all possible chunks, referred to as word groups here. A separate statistical component, encoded in HMMs (Hidden Markov Model), has been used to rate and rank the word groups so produced. Note that we are not pruning, we are only rating and ranking the word groups already obtained. Then we use a Best First Search strategy to produce parse outputs in best first order, without compromising on the ability to produce all possible parses in principle. We propose a bootstrapping strategy for improving HMM parameters and hence the performance of the parser as a whole. A wide coverage shallow parser has been implemented for English starting from the British National Corpus, a nearly 100 Million word POS tagged corpus. Note that the corpus is not a parsed corpus. Also, there are tagging errors, multiple tags assigned in many cases, and some words have not been tagged. A dictionary of 138,000 words with frequency counts for each word in each tag has been built. Extensive experiments have been carried out to evaluate the performance of the various modules. We work with large data sets and performance obtained is encouraging. A manually checked parsed corpus of 4000 sentences has also been developed and used to improve the parsing performance further. The entire system has been implemented in Perl under Linux.

...read moreread less

Posted Content•

Parts-of-Speech Tagger Errors Do Not Necessarily Degrade Accuracy in Extracting Information from Biomedical Text

[...]

Maurice Ht Ling¹, Christophe Lefevre, Kevin R. Nicholas•Institutions (1)

Association for Computing Machinery¹

02 Apr 2008-arXiv: Computation and Language

TL;DR: This study aims to evaluate the parts-of-speech (POS) tagging accuracy and attempts to explore whether a comparable performance is obtained when a generic POS tagger, MontyTagger, was used in place of MedPost, a tagger trained in biomedical text.

...read moreread less

Abstract: A recent study reported development of Muscorian, a generic text processing tool for extracting protein-protein interactions from text that achieved comparable performance to biomedical-specific text processing tools. This result was unexpected since potential errors from a series of text analysis processes is likely to adversely affect the outcome of the entire process. Most biomedical entity relationship extraction tools have used biomedical-specific parts-of-speech (POS) tagger as errors in POS tagging and are likely to affect subsequent semantic analysis of the text, such as shallow parsing. This study aims to evaluate the parts-of-speech (POS) tagging accuracy and attempts to explore whether a comparable performance is obtained when a generic POS tagger, MontyTagger, was used in place of MedPost, a tagger trained in biomedical text. Our results demonstrated that MontyTagger, Muscorian's POS tagger, has a POS tagging accuracy of 83.1% when tested on biomedical text. Replacing MontyTagger with MedPost did not result in a significant improvement in entity relationship extraction from text; precision of 55.6% from MontyTagger versus 56.8% from MedPost on directional relationships and 86.1% from MontyTagger compared to 81.8% from MedPost on nondirectional relationships. This is unexpected as the potential for poor POS tagging by MontyTagger is likely to affect the outcome of the information extraction. An analysis of POS tagging errors demonstrated that 78.5% of tagging errors are being compensated by shallow parsing. Thus, despite 83.1% tagging accuracy, MontyTagger has a functional tagging accuracy of 94.6%.

...read moreread less

Proceedings Article•DOI•

Integrate statistical model and lexical knowledge for Chinese multiword chunking

[...]

Qiang Zhou¹, Hang Yu¹•Institutions (1)

Tsinghua University¹

01 Oct 2008

TL;DR: A new relation tagging scheme to represent different intra-chunk relations is designed and several experiments of feature engineering are made to select a best baseline statistical model to improve parsing performance.

...read moreread less

Abstract: Multiword chunking is designed as a shallow parsing technique to recognize external constituent and internal relation tags of a chunk in sentence. In this paper, we propose a new solution to deal with this problem. We design a new relation tagging scheme to represent different intra-chunk relations and make several experiments of feature engineering to select a best baseline statistical model. We also apply outside knowledge from a large-scale lexical relationship knowledge base to improve parsing performance. By integrating all above techniques, we develop a new Chinese MWC parser. Experimental results show its parsing performance can greatly exceed the rule-based parser trained and tested in the same data set.

...read moreread less

Journal Article•

Analysis of Noun-Noun sequences: a rule based approach

[...]

Jose Maria Arriola, Juan Carlos Odriozola

01 Sep 2008-Procesamiento Del Lenguaje Natural

TL;DR: The practical goal of this work is to enrich the information of the shallow parser with linguistic information for analyzing sequences containing an N that instantiates a kind of quantification of the other nominal constituent, by means of some different syntactical structures.

...read moreread less

Abstract: This paper reports on work in progress to improve shallow parsing for Basque. The practical goal of our work is to enrich the information of the shallow parser with linguistic information for analyzing sequences containing an N that instantiates a kind of quantification of the other nominal constituent, by means of some different syntactical structures.

...read moreread less

Analysis of Noun-Noun sequences 1 : a rule based approach

[...]

Jose Mari Arriola, Juan Carlos Odriozola

01 Jan 2008

TL;DR: The authors report on work in progress to improve shallow parsing for Basque, by enriching the information of the shallow parser with linguistic information for analyzing sequences containing an N that instantiates a kind of quantification of the other nominal constituent, by means of some different syntactical structures.

...read moreread less