Showing papers on "Shallow parsing published in 2006"

PDF

Open Access

Journal Article•DOI•

A computer-aided environment for generating multiple-choice test items

[...]

Ruslan Mitkov¹, Le An Ha¹, Nikiforos Karamanis¹•Institutions (1)

01 Jun 2006-Natural Language Engineering

TL;DR: A novel computer-aided procedure for generating multiple-choice test items from electronic documents that makes use of language resources such as corpora and ontologies, and saves both time and production costs.

...read moreread less

Abstract: This paper describes a novel computer-aided procedure for generating multiple-choice test items from electronic documents. In addition to employing various Natural Language Processing techniques, including shallow parsing, automatic term extraction, sentence transformation and computing of semantic distance, the system makes use of language resources such as corpora and ontologies. It identifies important concepts in the text and generates questions about these concepts as well as multiple-choice distractors, offering the user the option to post-edit the test items by means of a user-friendly interface. In assisting test developers to produce items in a fast and expedient manner without compromising quality, the tool saves both time and production costs.

...read moreread less

216 citations

Book Chapter•DOI•

Open-Source portuguese–spanish machine translation

[...]

Carme Armentano-Oller, Rafael C. Carrasco, Antonio M. Corbí-Bellot, Mikel L. Forcada, Mireia Ginestí-Rosell, Sergio Ortiz-Rojas, Juan Antonio Pérez-Ortiz, Gema Ramírez-Sánchez, Felipe Sánchez-Martínez, Miriam A. Scalco - Show less +6 more

13 May 2006

TL;DR: The MT engine, the formats it uses for linguistic data, and the compilers that convert these data into an efficient format used by the engine are described, and it is described in more detail the pilot Portuguese and Spanish linguistic data.

...read moreread less

Abstract: This paper describes the current status of development of an open-source shallow-transfer machine translation (MT) system for the [European] Portuguese $\leftrightarrow$ Spanish language pair, developed using the OpenTrad Apertium MT toolbox (www.apertium.org). Apertium uses finite-state transducers for lexical processing, hidden Markov models for part-of-speech tagging, and finite-state-based chunking for structural transfer, and is based on a simple rationale: to produce fast, reasonably intelligible and easily correctable translations between related languages, it suffices to use a MT strategy which uses shallow parsing techniques to refine word-for-word MT. This paper briefly describes the MT engine, the formats it uses for linguistic data, and the compilers that convert these data into an efficient format used by the engine, and then goes on to describe in more detail the pilot Portuguese$\leftrightarrow$Spanish linguistic data.

...read moreread less

83 citations

Journal Article•DOI•

A hybrid method for relation extraction from biomedical literature

[...]

Minlie Huang¹, Xiaoyan Zhu¹, Ming Li¹, Ming Li², Ming Li³ - Show less +1 more•Institutions (3)

Tsinghua University¹, City University of Hong Kong², University of Waterloo³

01 Jun 2006-International Journal of Medical Informatics

TL;DR: A new approach is proposed, which is hybrid with both shallow parsing and pattern matching, to extract relations between proteins from scientific papers of biomedical themes, and has achieved an average F-score of 80% on individual verbs, and 66% on all verbs.

...read moreread less

35 citations

Reranking Translation Hypotheses Using Structural Properties

[...]

Sasa Hasan¹, Oliver Bender¹, Hermann Ney¹•Institutions (1)

RWTH Aachen University¹

01 Jan 2006

TL;DR: Improvements are possible by utilizing supertagging, lightweight dependency analysis, a link grammar parser and a maximum-entropy based chunk parser to investigate methods that add syntactically motivated features to a statistical machine translation system in a reranking framework.

...read moreread less

Abstract: We investigate methods that add syntactically motivated features to a statistical machine translation system in a reranking framework The goal is to analyze whether shallow parsing techniques help in identifying ungrammatical hypotheses We show that improvements are possible by utilizing supertagging, lightweight dependency analysis, a link grammar parser and a maximum-entropy based chunk parser Adding features to n-best lists and discriminatively training the system on a development set increases the BLEU score up to 07% on the test set

...read moreread less

24 citations

Proceedings Article•

Leveraging Machine Readable Dictionaries in Discriminative Sequence Models

[...]

Ben Wellner¹, Marc Vilain•Institutions (1)

Mitre Corporation¹

01 Jan 2006

TL;DR: The utility of corpora-independent lexicons derived from machine readable dictionaries are demonstrated, and substantial error reductions are shown for the tasks of part-of-speech tagging and shallow parsing.

...read moreread less

Abstract: Many natural language processing tasks make use of a lexicon typically the words collected from some annotated training data along with their associated properties We demonstrate here the utility of corpora-independent lexicons derived from machine readable dictionaries Lexical information is encoded in the form of features in a Conditional Random Field tagger providing improved performance in cases where: i) limited training data is made available ii) the data is case-less and iii) the test data genre or domain is different than that of the training data We show substantial error reductions, especially on unknown words, for the tasks of part-of-speech tagging and shallow parsing, achieving up to 20% error reduction on Penn TreeBank part-of-speech tagging and up to a 157% error reduction for shallow parsing using the CoNLL 2000 data Our results here point towards a simple, but effective methodology for increasing the adaptability of text processing systems by training models with annotated data in one genre augmented with general lexical information or lexical information pertinent to the target genre (or domain)

...read moreread less

17 citations

Book Chapter•DOI•

Chinese chunking with tri-training learning

[...]

Wenliang Chen¹, Yujie Zhang¹, Hitoshi Isahara¹•Institutions (1)

National Institute of Information and Communications Technology¹

17 Dec 2006

TL;DR: A novel selection method for tri-training learning in which newly labeled sentences are selected by comparing the agreements of three classifiers if the other two classifiers agree on the labels while itself disagrees.

...read moreread less

Abstract: This paper presents a practical tri-training method for Chinese chunking using a small amount of labeled training data and a much larger pool of unlabeled data. We propose a novel selection method for tri-training learning in which newly labeled sentences are selected by comparing the agreements of three classifiers. In detail, in each iteration, a new sample is selected for a classifier if the other two classifiers agree on the labels while itself disagrees. We compare the proposed tri-training learning approach with co-training learning approach on Upenn Chinese Treebank V4.0(CTB4). The experimental results show that the proposed approach can improve the performance significantly.

...read moreread less

16 citations

Proceedings Article•

Multilingual Ontology Acquisition from Multiple MRDs

[...]

Eric Nichols¹, Francis Bond, Takaaki Tanaka, Sanae Fujita, Dan Flickinger² - Show less +1 more•Institutions (2)

Nara Institute of Science and Technology¹, Stanford University²

01 Jul 2006

TL;DR: A system that automatically constructs ontologies by extracting knowledge from dictionary definition sentences using Robust Minimal Recursion Semantics (RMRS) is outlined and how this system was designed to handle multiple lexicons and languages is discussed.

...read moreread less

Abstract: In this paper, we outline the development of a system that automatically constructs ontologies by extracting knowledge from dictionary definition sentences using Robust Minimal Recursion Semantics (RMRS). Combining deep and shallow parsing resource through the common formalism of RMRS allows us to extract ontological relations in greater quantity and quality than possible with any of the methods independently. Using this method, we construct ontologies from two different Japanese lexicons and one English lexicon. We then link them to existing, handcrafted ontologies, aligning them at the word-sense level. This alignment provides a representative evaluation of the quality of the relations being extracted. We present the results of this ontology construction and discuss how our system was designed to handle multiple lexicons and languages.

...read moreread less

14 citations

Journal Article•

UCSG shallow parser

[...]

Guntur Bharadwaja Kumar¹, Kavi Narayana Murthy¹•Institutions (1)

University of Hyderabad¹

01 Jan 2006-Lecture Notes in Computer Science

TL;DR: The primary aim of the design of the UCSG parsing architecture is developing a judicious combination of linguistic and statistical methods to develop wide coverage robust shallow parsing systems, without the need for large scale manually parsed training corpora.

...read moreread less

Abstract: Recently, there is an increasing interest in integrating rule based methods with statistical techniques for developing robust, wide coverage, high performance parsing systems. In this paper 1 , we describe an architecture, called UCSG shallow parser architecture, which combines linguistic constraints expressed in the form of finite state grammars with statistical rating using HMMs built from a POS-tagged corpus and an A* search for global optimization for determining the best shallow parse for a given sentence. The primary aim of the design of the UCSG parsing architecture is developing a judicious combination of linguistic and statistical methods to develop wide coverage robust shallow parsing systems, without the need for large scale manually parsed training corpora. The UCSG architecture uses a grammar to specify all valid structures and a statistical component to rate and rank the possible alternatives, so as to produce the best parse first without compromising on the ability to produce all possible parses. The architecture supports bootstrapping with an aim to reduce the need for parsed training corpora. The complete system has been implemented in Per1 under Linux. In this paper we first describe the UCSG shallow parsing architecture and then focus on the evaluation of the UCSG finite state grammar for the chunking task for English. Recall of 91.16% and 93.73% have been obtained on the Susanne parsed corpus and CoNLL 2000 chunking task test data set respectively. Extensive experimentation is under way to evaluate the other modules.

...read moreread less

13 citations

Book Chapter•DOI•

UCSG shallow parser

[...]

Guntur Bharadwaja Kumar¹, Kavi Narayana Murthy¹•Institutions (1)

University of Hyderabad¹

19 Feb 2006

TL;DR: The UCSG shallow parser as mentioned in this paper uses a grammar to specify all valid structures and a statistical component to rate and rank the possible alternatives, so as to produce the best parse first without compromising on the ability to produce all possible parses.

...read moreread less

Abstract: Recently, there is an increasing interest in integrating rule based methods with statistical techniques for developing robust, wide coverage, high performance parsing systems. In this paper, we describe an architecture, called UCSG shallow parser architecture, which combines linguistic constraints expressed in the form of finite state grammars with statistical rating using HMMs built from a POS-tagged corpus and an A* search for global optimization for determining the best shallow parse for a given sentence. The primary aim of the design of the UCSG parsing architecture is developing a judicious combination of linguistic and statistical methods to develop wide coverage robust shallow parsing systems, without the need for large scale manually parsed training corpora. The UCSG architecture uses a grammar to specify all valid structures and a statistical component to rate and rank the possible alternatives, so as to produce the best parse first without compromising on the ability to produce all possible parses. The architecture supports bootstrapping with an aim to reduce the need for parsed training corpora. The complete system has been implemented in Perl under Linux. In this paper we first describe the UCSG shallow parsing architecture and then focus on the evaluation of the UCSG finite state grammar for the chunking task for English. Recall of 91.16% and 93.73% have been obtained on the Susanne parsed corpus and CoNLL 2000 chunking task test data set respectively. Extensive experimentation is under way to evaluate the other modules.

...read moreread less

12 citations

Book Chapter•DOI•

Conditional random fields based label sequence and information feedback

[...]

Wei Jiang¹, Yi Guan¹, Xiaolong Wang¹•Institutions (1)

Harbin Institute of Technology¹

16 Aug 2006

TL;DR: This paper presents a method of Chinese POS tagging and shallow parsing based on conditional random fields (CRF), as new discriminative sequential models, which may incorporate many rich features and well avoid the label bias problem.

...read moreread less

Abstract: Part-of-speech (POS) tagging and shallow parsing are sequence modeling problems. While HMM and other generative models are not the most appropriate for the task of labeling sequential data. Compared with HMM, Maximum Entropy Markov models (MEMM) and other discriminative finite-state models can easily fused more features, however they suffer from the label bias problem. This paper presents a method of Chinese POS tagging and shallow parsing based on conditional random fields (CRF), as new discriminative sequential models, which may incorporate many rich features and well avoid the label bias problem. Moreover, we propose the information feedback from syntactical analysis to lexical analysis, since natural language should be a multi-knowledge interaction in nature. Experiments show that CRF approach achieves 0.70% F-score improvement in POS tagging and 0.67% improvement in shallow parsing. And we also confirm the effectiveness of information feedback to some complicated multi-class words.

...read moreread less

7 citations

Journal Article•DOI•

Quality Competence: a Source of Sustained Competitive Advantage

[...]

Zhi-yu Wang, Yan-lin Qiu, Shi-he Gui¹•Institutions (1)

Wuhan University¹

01 Mar 2006-The Journal of China Universities of Posts and Telecommunications

TL;DR: The experimental results show that pos information is greatly helpful to improve the performance of Chinese shallow parsing and integrates Chinese linguistics information of the chunk into HMM model.

...read moreread less

Journal Article•

Shallow parsing of spoken estonian using constraint grammar

[...]

Kaili Müürisep, Heli Uibo

01 Jan 2006-Copenhagen studies in language

TL;DR: This paper describes how the syntactic analyzer of written Estonian to the spoken language was adapted, and the introduced changes are described and the achieved results are analyzed.

...read moreread less

Abstract: In this paper we describe how we have adapted the syntactic analyzer of written Estonian to the spoken language. The Constraint Grammar shallow syntactic parser (Müürisep et al. 2003) was used for the automatic syntactic analysis of the corpus of Estonian spoken language (Hennoste et al. 2000). To adapt the parser, the clause boundary detection rules as well as some syntactic constraints had to be changed. Two new syntactic tags were also introduced. In the paper the introduced changes are described and the achieved results are analyzed. The parser determined the syntactic label unambiguosly for 90% of the words in the text in average, using the manually morphologically disambiguated text as an input. The error rate was less than 3%.

...read moreread less

Proceedings Article•DOI•

Improving Sequence Tagging using Machine-Learning Techniques

[...]

Wei Jiang¹, Xiaolong Wang¹, Yi Guan¹•Institutions (1)

Harbin Institute of Technology¹

01 Aug 2006

TL;DR: Conditional random fields (CRF) is presented as a new kind of discriminative sequential model, it can incorporate many rich features, and well avoid the label bias problem that is the limitation of maximum entropy Markov models (MEMM) and other discrim inative finite-state models.

...read moreread less

Abstract: This paper presents an excel sequence tagging approach based on the combined machine learning methods. Firstly, conditional random fields (CRF) is presented as a new kind of discriminative sequential model, it can incorporate many rich features, and well avoid the label bias problem that is the limitation of maximum entropy Markov models (MEMM) and other discriminative finite-state models. Secondly, support vector machine is improved to adapt the sequential tagging task. Finally, these improved models and other existing models are combined together, which have achieved the state-of-the-art performance. Experimental results show that CRF approach achieves 0.70% improvement in POS tagging and 0.67% improvement in shallow parsing. Moreover, our combination method achieves F-measure 93.73% and 93.69% in above two tasks respectively, which is better than any sub-model.

...read moreread less

RGU at the TREC Blog Track

[...]

Malcolm Clark¹, Ulises Cerviño Beresi¹, Stuart Watt¹, David J. Harper¹•Institutions (1)

Robert Gordon University¹

01 Jan 2006

TL;DR: This document reports the experiments conducted at The Robert Gordon University (RGU) where Statistical Language Models combined with shallow parsing techniques for the opinion retrieval problem.

...read moreread less

Abstract: Blogs are highly rich in opinion making their automatic processing appealing to marketing companies, the media, costumer centres, etc. TREC ran a Blog track in 2006 with two tasks: opinion retrieval and an open task. This document reports the experiments conducted at The Robert Gordon University (RGU) where we used Statistical Language Models combined with shallow parsing techniques for the opinion retrieval problem.

...read moreread less

Proceedings Article•

Validating text mining results on protein-protein interactions using gene expression profiles

[...]

Deyu Zhou¹, Yulan He¹, Chee Keong Kwoh¹•Institutions (1)

Nanyang Technological University¹

01 Jan 2006

TL;DR: A probability model is proposed to score the confidence of protein-protein interactions based on both text mining results and gene expression profiles, and experimental results are presented to show the feasibility of this framework.

...read moreread less

Abstract: Protein-protein interactions referring to the associations of protein molecules are crucial for many biological functions. Since most knowledge about them still hides in biological publications, there is an increasing focus on mining information from the vast amount of biological literature such as MedLine. Many approaches, such as pattern matching, shallow parsing and deep parsing, have been proposed to automatically extract protein-protein interaction information from text sources, with however limited success. Moreover, to the best of our knowledge, none of the existing approaches have performed automatic validation on the mining results. In this paper, we describe a novel framework in which text mining results are automatically validated using the knowledge mined from gene expression profiles. A probability model is proposed to score the confidence of protein-protein interactions based on both text mining results and gene expression profiles. Experimental results are presented to show the feasibility of this framework.

...read moreread less

Cross-Language French-English Question Answering using the DLT System at CLEF 2006.

[...]

Richard F. E. Sutcliffe, Kieran White, Darina M. Slattery, Igal Gabbay, Michael Mulcahy - Show less +1 more

01 Jan 2006

TL;DR: The system built by the Documents and Linguistic Technology (DLT) Group at University of Limerick for participation in the French-English Question Answering Task of the Cross Language Evaluation Forum (CLEF) resulted in improved performance.

...read moreread less

Abstract: The basic architecture of our factoid system is standard in nature and comprises query type identification, query analysis and translation, retrieval query formulation, document retrieval, text file parsing, named entity recognition and answer entity selection. Factoid classification into 69 query types is carried out using keywords. Associated with each type is a set of one or more Named Entities. Xelda is used to tag the French query for partof-speech and then shallow parsing is carried out over these in order to recognise thirteen different kinds of significant phrase. These were determined after a study of the constructions used in French queries together with their English counterparts. Our observations were that (1) Proper names usually only start with a capital letter with subsequent words un-capitalised, unlike English; (2) Adjective-Noun combinations either capitalised or not can have the status of compounds in French and hence need special treatment; (3) Certain noun-preposition-noun phrases are also of significance. The phrases are then translated into English by the engine WorldLingo and using the Grand Dictionnaire Terminologique, the results being combined. Each phrase has a weight assigned to it by the parser. A Boolean retrieval query is formulated consisting of an AND of all phrases in increasing order of weight. The corpus is indexed by sentence using Lucene. The Boolean query is submitted to the engine and if unsuccessful is re-submitted with the first (least significant) term removed. The process continues until the search succeeds. The documents (i.e. sentences) are retrieved and the NEs corresponding to the identified query type are marked. Significant terms from the query are also marked. Each NE is scored based on its distance from query terms and their individual weights. The answer returned is the highest-scoring NE. Temporarily Restricted Factoids are treated in the same way as Factoids. Definition questions are classified in three ways: organisation, person or unknown. This year Factoids had to be recognised automatically by an extension of the classifier. An IR query is formulated using the main term in the original question plus a disjunction of phrases depending on the identified type. All matching sentences are returned complete. Results this year were as follows: 32/150 (21%) of Factoids were R, 14/150 (9%) were X, 4/40 (10%) of Definitions were R and 2 List results were R (P@N = 0.2). Our ranking in Factoids relative to all thirteen runs was Fourth. However, scoring all systems over R&X together and including Definitions, our ranking would be Second Equal because we had more X scores than any other system. Last year our score on Factoids was 26/150 (17%) but the difference is probably the easier queries this year. 1 On Sabbatical from University of Limerick.

...read moreread less

Journal Article•

Shallow case role annotation using two-stage feature-enhanced string matching

[...]

Samuel W. K. Chan

01 Jan 2006-Lecture Notes in Computer Science

TL;DR: The authors proposed a two-stage annotation method for identification of case roles in Chinese sentences, which makes use of a feature-enhanced string matching technique which takes full advantage of a huge number of sentence patterns in a Treebank.

...read moreread less

Abstract: A two-stage annotation method for identification of case roles in Chinese sentences is proposed. The approach makes use of a feature-enhanced string matching technique which takes full advantage of a huge number of sentence patterns in a Treebank. The first stage of the approach is a coarse-grained syntactic parsing which is complementary to a semantic dissimilarities analysis in its latter stage. The approach goes beyond shallow parsing to a deeper level of case role identification, while preserving robustness, without being bogged down into a complete linguistic analysis. The ideas described have been implemented and an evaluation of 5,000 Chinese sentences is examined in order to justify its significances.

...read moreread less

A shallow syntactic annotation scheme for Icelandic text

[...]

Hrafn Loftsson, Eiríkur Rögnvaldsson

01 Jan 2006

TL;DR: A shallow syntactic annotation scheme for Icelandic text that comprises a set of grammatical descriptors and their application guidelines and a grammar de nition corpus, annotated using the annotation scheme.

...read moreread less

Abstract: We describe a shallow syntactic annotation scheme for Icelandic text. The scheme comprises a set of grammatical descriptors and their application guidelines. The descriptors consist of brackets and labels which indicate constituent structure and functional relations. Additionally, we describe a grammar de nition corpus, annotated using the annotation scheme. The annotation scheme has been developed as a part of a shallow parsing project.

...read moreread less

Book Chapter•DOI•

Shallow Parsing of INEX Queries

[...]

Haïfa Zargayouna¹, Victor Rosas¹, Sylvie Salotti¹•Institutions (1)

University of Paris¹

17 Dec 2006

TL;DR: The contribution of the LIPN to the NLQ2NEXI task (part of the Natural Language Processing (NLP) track) of the Initiative for Evaluation of XML Retrieval (INEX 2006) discusses the use of shallow parsing methods to analyse natural language queries.

...read moreread less

Abstract: This article presents the contribution of the LIPN : Laboratoire d’Informatique de Paris Nord (France) to the NLQ2NEXI (Natural Language Queries to NEXI) task (part of the Natural Language Processing (NLP) track) of the Initiative for Evaluation of XML Retrieval (INEX 2006) It discusses the use of shallow parsing methods to analyse natural language queries

...read moreread less

Proceedings Article•

TCtract-A collocation extraction approach for noun phrases using shallow parsing rules and statistic models

[...]

Wan Yin Li¹, Qin Lu¹, James N. K. Liu¹•Institutions (1)

Hong Kong Polytechnic University¹

01 Dec 2006

TL;DR: A hybrid method for extracting Chinese noun phrase collocations that combines a statistical model with rule-based linguistic knowledge and a set of statistic-based association measures (AMs) as filters is presented.

...read moreread less

Abstract: This paper presents a hybrid method for extracting Chinese noun phrase collocations that combines a statistical model with rule-based linguistic knowledge. The algorithm first extracts all the noun phrase collocations from a shallow parsed corpus by using syntactic knowledge in the form of phrase rules. It then removes pseudo collocations by using a set of statistic-based association measures (AMs) as filters. There are two main purposes for the design of this hybrid algorithm: (1) to maintain a reasonable recall while improving the precision, and (2) to investigate the proposed association measures on Chinese noun phrase collocations. The performance is compared with a pure statistical model and a pure rule-based method on a 60MB PoS tagged corpus. The experiment results show that the proposed hybrid method has a higher precision of 92.65% and recall of 47% based on 29 randomly selected noun headwords compared with the precision of 78.87% and recall of 27.19% of a statistics based extraction system. The F-score improvement is 55.7%.

...read moreread less

Book Chapter•DOI•

Extracting ontological relations of korean numeral classifiers from semi-structured resources using NLP techniques

[...]

Youngim Jung¹, Soonhee Hwang¹, Aesun Yoon¹, Hyuk-Chul Kwon¹•Institutions (1)

Pusan National University¹

29 Oct 2006

TL;DR: A semi-automatic method of extracting and representing the various ontological relations of Korean numeral classifiers is proposed Shallow parsing and word-sense disambiguation were used to extract semantic relations from natural language texts and from wordnets.

...read moreread less

Abstract: Many studies have focused on the facts that numeral classifiers give decisive clues to the semantic categorizing of nouns However, few studies have analyzed the ontological relationships of classifiers or the construction of classifier ontology In this paper, a semi-automatic method of extracting and representing the various ontological relations of Korean numeral classifiers is proposed Shallow parsing and word-sense disambiguation were used to extract semantic relations from natural language texts and from wordnets.

...read moreread less

Proceedings Article•

An Algorithm Combining Statistics-based and Rules-based for Chunk Identification of Chinese Sentences

[...]

Rongbo Wang¹, Rongbo Wang², Zheru Chi¹, Zheru Chi²•Institutions (2)

Hong Kong Polytechnic University¹, Hangzhou Dianzi University²

01 Dec 2006

TL;DR: The purpose of this paper is to characterize a chunk boundary parsing algorithm, using a statistical method combining adjustment rules, which serves as a supplement to traditional statistics-based parsing methods.

...read moreread less

Abstract: Natural language processing (NLP) is a very hot research domain. One important branch of it is sentence analysis, including Chinese sentence analysis. However, currently, no mature deep analysis theories and techniques are available. An alternative way is to perform shallow parsing on sentences which is very popular in the domain. The chunk identification is a fundamental task for shallow parsing. The purpose of this paper is to characterize a chunk boundary parsing algorithm, using a statistical method combining adjustment rules, which serves as a supplement to traditional statistics-based parsing methods. The experimental results show that the model works well on the small dataset. It will contribute to the sequent processes like chunk tagging and chunk collocation extraction under other topics etc.

...read moreread less

Book Chapter•DOI•

Shallow parsing based on comma values

[...]

Diego Garat¹•Institutions (1)

University of the Republic¹

23 Oct 2006

TL;DR: In the belief that punctuation can aid in the process of sentence structure analysis, this work focuses on a prior assignment of values to commas in Spanish texts, with very encouraging results.

...read moreread less

Abstract: In the belief that punctuation can aid in the process of sentence structure analysis, our work focuses on a prior assignment of values to commas in Spanish texts. Supervised machine learning techniques are applied for learning commas classifiers, taking as input attributes positional information and part of speech tags. One of these comma classifiers and a rule-based analyzer are combined in order to recognize and label text structures. The prior assignment of values to commas allowed the simplification of recognition rules, with very encouraging results.

...read moreread less

Book Chapter•DOI•

Shallow case role annotation using two-stage feature-enhanced string matching

[...]

Samuel W. K. Chan¹•Institutions (1)

The Chinese University of Hong Kong¹

19 Feb 2006

TL;DR: A two-stage annotation method for identification of case roles in Chinese sentences is proposed which goes beyond shallow parsing to a deeper level of case role identification, while preserving robustness, without being bogged down into a complete linguistic analysis.

...read moreread less