Showing papers on "Shallow parsing published in 2007"

PDF

Open Access

Proceedings Article•

Using Corpus Statistics on Entities to Improve Semi-supervised Relation Extraction from the Web

[...]

Benjamin Rosenfeld, Ronen Feldman¹•Institutions (1)

01 Jun 2007

TL;DR: This paper shows how to use corpus statistics to validate and correct the arguments of extracted relation instances, improving the overall RE performance.

...read moreread less

Abstract: Many errors produced by unsupervised and semi-supervised relation extraction (RE) systems occur because of wrong recognition of entities that participate in the relations. This is especially true for systems that do not use separate named-entity recognition components, instead relying on general-purpose shallow parsing. Such systems have greater applicability, because they are able to extract relations that contain attributes of unknown types. However, this generality comes with the cost in accuracy. In this paper we show how to use corpus statistics to validate and correct the arguments of extracted relation instances, improving the overall RE performance. We test the methods on SRES – a self-supervised Web relation extraction system. We also compare the performance of corpus-based methods to the performance of validation and correction methods based on supervised NER components.

...read moreread less

51 citations

Journal Article•

Zero Anaphora Resolution in Chinese with Shallow Parsing.

[...]

Ching-Long Yeh¹, Yi-Chun Chen•Institutions (1)

Tatung University¹

01 Jan 2007-Journal of Chinese Language and Computing

TL;DR: This paper works on the output of a part-of-speech tagger and uses shallow parsing instead of complex parsing to resolve zero anaphors in written Chinese and employs centering theory and constraint rules to identify the antecedents of zero anphors as they appear in the preceding utterances.

...read moreread less

Abstract: Most traditional approaches to anaphora resolution are based on the integration of complex linguistic information and domain knowledge. However, the construction of a domain knowledge base is very labor-intensive and time-consuming. In this paper, we work on the output of a part-of-speech tagger and use shallow parsing instead of complex parsing to resolve zero anaphors in written Chinese. We employ centering theory and constraint rules to identify the antecedents of zero anaphors as they appear in the preceding utterances. We focus on the cases of zero anaphors that occur in the topic or subject, and object positions of utterances. The experimental result shows that the precision rates of zero anaphora detection and the recall rate of zero anaphora resolution with the method are 81% and 70% respectively.

...read moreread less

44 citations

Journal Article•DOI•

Corpus-based semantic role approach in information retrieval

[...]

Paloma Moreda¹, Borja Navarro¹, Manuel Palomar¹•Institutions (1)

University of Alicante¹

01 Jun 2007

TL;DR: The SemRol method is a corpus-based approach that uses two different statistical models, conditional Maximum Entropy (ME) Probability Models and the TiMBL program, a Memory-based Learning to determine the semantic role for the constituents of a sentence.

...read moreread less

Abstract: In this paper, a method to determine the semantic role for the constituents of a sentence is presented. This method, named SemRol, is a corpus-based approach that uses two different statistical models, conditional Maximum Entropy (ME) Probability Models and the TiMBL program, a Memory-based Learning. It consists of three phases that make use of features using words, lemmas, PoS tags and shallow parsing information. Our method introduces a new phase in the Semantic Role Labeling task which has usually been approached as a two phase procedure consisting of recognition and labeling arguments. From our point of view, firstly the sense of the verbs in the sentences must be disambiguated. That is why depending on the sense of the verb a different set of roles must be considered. Regarding the labeling arguments phase, a tuning procedure is presented. As a result of this procedure one of the best sets of features for the labeling arguments task is detected. With this set, that is different for TiMBL and ME, precisions of 76.71% for TiMBL or 70.55% for ME, are obtained. Furthermore, the semantic role information provided by our SemRol method could be used as an extension of Information Retrieval or Question Answering systems. We propose using this semantic information as an extension of an Information Retrieval system in order to reduce the number of documents or passages retrieved by the system.

...read moreread less

35 citations

Shallow Parsing for South Asian Languages

[...]

Rajeev Sangal, Sushma Bendre, Dipti Misra Sharma, Prashanth Mannem

01 Jan 2007

TL;DR: This paper gives the complete account of the contest in terms of how the data for the three languages was released, the performances of the participating systems and an overview of the approaches followed for POS tagging and chunking.

...read moreread less

Abstract: As part of the IJCAI workshop on ”Shallow Parsing for South Asian Languages”, a contest was held in which the participants trained and tested their shallow parsing systems for Hindi, Bengali and Telugu. This paper gives the complete account of the contest in terms of how the data for the three languages was released, the performances of the participating systems and an overview of the approaches followed for POS tagging and chunking. We finally give an analysis of the systems which gives insights to directions for future research on shallow parsing for South Asian languages.

...read moreread less

22 citations

Journal Article•DOI•

A robust multilingual portable phrase chunking system

[...]

Yue-Shi Lee¹, Yu-Chieh Wu²•Institutions (2)

Ming Chuan University¹, National Central University²

01 Oct 2007-Expert Systems With Applications

TL;DR: A novel phrase chunking model based on the proposed mask method without employing external knowledge and multiple learners that could automatically derive more training examples from the original training data, which significantly improves system performance.

...read moreread less

Abstract: Automatic text chunking aims to recognize grammatical phrase structures in natural language text. Text chunking provides downstream syntactic information for further analysis, which is also an important technology in the area of text mining (TM) and natural language processing (NLP). Existing chunking systems make use of external knowledge, e.g. grammar parsers, or integrate multiple learners to achieve higher performance. However, the external knowledge is almost unavailable in many domains and languages. Besides, employing multiple learners does not only complicate the system architecture, but also increase training and testing time costs. In this paper, we present a novel phrase chunking model based on the proposed mask method without employing external knowledge and multiple learners. The mask method could automatically derive more training examples from the original training data, which significantly improves system performance. We had evaluated our method in different chunking tasks and languages in comparison to previous studies. The experimental results show that our method achieves state of the art performance in chunking tasks. In two English chunking tasks, i.e., shallow parsing and base-chunking, our method achieves 94.22 and 93.23 in F"("@b"="1") rates. When porting to Chinese, the F"("@b"="1") rate is 92.30. Also, our chunker is quite efficient. The complete chunking time of a 50K-words is less than 10s.

...read moreread less

21 citations

Appraisal Extraction for News Opinion Analysis at NTCIR-6

[...]

Kenneth Bloom, Sterling Stuart Stein, Shlomo Argamon

01 Jan 2007

TL;DR: A system which uses lexical shallow parsing to find adjectival “appraisal groups” in sentences, which convey a positive or negative appraisal of an item, is described.

...read moreread less

Abstract: We describe a system which uses lexical shallow parsing to find adjectival “appraisal groups” in sentences, which convey a positive or negative appraisal of an item. We used a simple heuristic to detect opinion holders, determining whether a person was being quoted in a specific sentence or not, and if so, who. We also explored the the use of unsupervised learners and voting to increase our coverage.

...read moreread less

20 citations

Proceedings Article•DOI•

Statistically-constrained shallow text marking : techniques, evaluation paradigm and results

[...]

Brian R. Murphy¹, Carl Vogel¹•Institutions (1)

Trinity College, Dublin¹

01 Jan 2007

TL;DR: Three natural language marking strategies based on fast and reliable shallow parsing techniques, and on widely available lexical resources: lexical substitution, adjective conjunction swaps, and relativiser switching are presented.

...read moreread less

Abstract: We present three natural language marking strategies based on fast and reliable shallow parsing techniques, and on widely available lexical resources: lexical substitution, adjective conjunction swaps, and relativiser switching. We test these techniques on a random sample of the British National Corpus. Individual candidate marks are checked for goodness of structural and semantic fit, using both lexical resources, and the web as a corpus. A representative sample of marks is given to 25 human judges to evaluate for acceptability and preservation of meaning. This establishes a correlation between corpus based felicity measures and perceived quality, and makes qualified predictions. Grammatical acceptability correlates with our automatic measure strongly (Pearson's r = 0.795, p = 0.001), allowing us to account for about two thirds of variability in human judgements. A moderate but statistically insignificant (Pearson's r = 0.422, p = 0.356) correlation is found with judgements of meaning preservation, indicating that the contextual window of five content words used for our automatic measure may need to be extended.

...read moreread less

16 citations

Journal Article•DOI•

A block bigram prediction model for statistical machine translation

[...]

Christoph Tillmann¹, Tong Zhang•Institutions (1)

IBM¹

01 Jul 2007-ACM Transactions on Speech and Language Processing

TL;DR: A novel training method for a localized phrase-based prediction model for statistical machine translation (SMT) that explicitly handles local phrase reordering and a novel stochastic gradient descent training algorithm is presented that can easily handle millions of features.

...read moreread less

Abstract: In this article, we present a novel training method for a localized phrase-based prediction model for statistical machine translation (SMT). The model predicts block neighbors to carry out a phrase-based translation that explicitly handles local phrase reordering. We use a maximum likelihood criterion to train a log-linear block bigram model which uses real-valued features (e.g., a language model score) as well as binary features based on the block identities themselves (e.g., block bigram features). The model training relies on an efficient enumeration of local block neighbors in parallel training data. A novel stochastic gradient descent (SGD) training algorithm is presented that can easily handle millions of features. Moreover, when viewing SMT as a block generation process, it becomes quite similar to sequential natural language annotation problems such as part-of-speech tagging, phrase chunking, or shallow parsing. Our novel approach is successfully tested on a standard Arabic-English translation task using two different phrase reordering models: a block orientation model and a phrase-distortion model.

...read moreread less

14 citations

Journal Article•DOI•

Efficient text chunking using linear kernel with masked method

[...]

Yu-Chieh Wu¹, Chia-Hui Chang¹•Institutions (1)

National Central University¹

01 Apr 2007-Knowledge Based Systems

TL;DR: This paper proposes an efficient and accurate text chunking system using linear SVM kernel and a new technique called masked method and proposes a masked-based method to solve unknown word problem to enhance system performance.

...read moreread less

Abstract: In this paper, we proposed an efficient and accurate text chunking system using linear SVM kernel and a new technique called masked method. Previous researches indicated that systems combination or external parsers can enhance the chunking performance. However, the cost of constructing multi-classifiers is even higher than developing a single processor. Moreover, the use of external resources will complicate the original tagging process. To remedy these problems, we employ richer features and propose a masked-based method to solve unknown word problem to enhance system performance. In this way, no external resources or complex heuristics are required for the chunking system. The experiments show that when training with the CoNLL-2000 chunking dataset, our system achieves 94.12 in F"("@b") rate with linear. Furthermore, our chunker is quite efficient since it adopts a linear kernel SVM. The turn-around tagging time on CoNLL-2000 testing data is less than 50s which is about 115 times than polynomial kernel SVM.

...read moreread less

7 citations

Proceedings Article•DOI•

SRCB-WSD: Supervised Chinese Word Sense Disambiguation with Key Features

[...]

Yun Xing¹•Institutions (1)

Ricoh¹

23 Jun 2007

TL;DR: The implementation of Word Sense Disambiguation system that participated in the SemEval-2007 multilingual Chinese-English lexical sample task was adopted with Maximum Entropy classifier, which obtained precision of 0.716 in micro-average, which is the best among all participated systems.

...read moreread less

Abstract: This article describes the implementation of Word Sense Disambiguation system that participated in the SemEval-2007 multilingual Chinese-English lexical sample task. We adopted a supervised learning approach with Maximum Entropy classifier. The features used were neighboring words and their part-of-speech, as well as single words in the context, and other syntactic features based on shallow parsing. In addition, we used word category information of a Chinese thesaurus as features for verb disambiguation. For the task we participated in, we obtained precision of 0.716 in micro-average, which is the best among all participated systems.

...read moreread less

5 citations

Proceedings Article•

An algorithm for semantic chunk identification of Chinese sentence

[...]

Wang Rongbo¹, Chi Zheru², Wang Xiao-hua¹, Wu Ting¹•Institutions (2)

Hangzhou Dianzi University¹, Hong Kong Polytechnic University²

06 Nov 2007

TL;DR: The purpose of this paper is to characterize a chunk boundary parsing algorithm, using a statistical method combining adjustment rules, which serves as a supplement to traditional statistics-based parsing methods.

...read moreread less

Abstract: Natural language processing (NLP) is a very hot research domain. One important branch of it is sentence analysis, including Chinese sentence analysis. However, currently, no mature deep analysis theories and techniques are available. An alternative way is to perform shallow parsing on sentences which is very popular in the domain. The chunk identification is a fundamental task for shallow parsing. The purpose of this paper is to characterize a chunk boundary parsing algorithm, using a statistical method combining adjustment rules, which serves as a supplement to traditional statistics-based parsing methods. The experimental results show that the model works well on the small dataset. It will contribute to the sequent processes like chunk tagging and chunk collocation extraction under other topics etc.

...read moreread less

Journal Article•DOI•

Chunk Segmentation of Chinese Sentences Using a Combined Statistical and Rule-based Approach (CSRA)

[...]

Rongbo Wang¹, Xiaohua Wang, Zhiqun Chen, Zheru Chi•Institutions (1)

Hangzhou Dianzi University¹

01 Jun 2007-International Journal of Computer Processing of Languages

TL;DR: This paper presents a chunk segmentation algorithm using a combined statistical and rule-based approach (CSRA), where the decision rules for refining chunks are generated from incorrectly segmented chunks from a statistical model which is built on a training corpus.

...read moreread less

Abstract: Deep parsing of Chinese sentences is a very challenging task due to their complexity such as ambiguous word boundaries and meanings. An alternative mode of Chinese language processing is to perform shallow parsing of Chinese sentences in which chunk segmentation plays an important role. In this paper, we present a chunk segmentation algorithm using a combined statistical and rule-based approach (CSRA). The decision rules for refining chunk segmentation are generated from incorrectly segmented chunks from a statistical model which is built on a training corpus. Experimental results show that the CSRA works well and produces satisfactory chunk segmentation results for subsequent processes such as chunk tagging and chunk collocation extraction.

...read moreread less

Proceedings Article•DOI•

An Enhanced Text Analysis Approach in Text-to-Speech Synthesis for Mandarin Chinese

[...]

Wei Jiang¹, Xiaolong Wang¹, Yi Guan¹, Xiu-Li Pang¹•Institutions (1)

Harbin Institute of Technology¹

24 Aug 2007

TL;DR: A semi-automatic word extraction approach for general dictionary and the specialty dictionaries based on Information Entropy is presented, which shows that CRF achieved 1.09% improvement in POS tagging task, and 0.67% in shallow parsing task in terms of F-measure.

...read moreread less

Abstract: An enhanced text analysis approach for Chinese text- to-speech (TTS) systems is presented in this paper, as the basic understanding process, the text analysis need provide a fine and effective linguistic information, which is marked explicitly with the corresponding notation. Two kinds of work are done to improve the TTS performance. Firstly, the shallow parsing information is introduced, which is processed by the conditional random fields, accordingly, the label bias problem is overcome; Secondly, considering the dictionary is very important not only in the Chinese word segmentation, but also in the Pinyin-to-Character conversion, we present a semi-automatic word extraction approach for general dictionary and the specialty dictionaries based on Information Entropy. The experiments show that CRF achieved 1.09% improvement in POS tagging task, and 0.67% in shallow parsing task in terms of F-measure. The specialty words can increases the precision by 1.80% to the word segmentation.

...read moreread less

Proceedings Article•DOI•

Applying Support Vector Machines to Chinese Shallow Parsing

[...]

Yongsheng Guo¹, Yongmei Tan²•Institutions (2)

Northeastern University¹, Beijing University of Posts and Telecommunications²

29 Oct 2007

TL;DR: By introducing the Kernel principle, SVMs can carry out the training in high-dimensional space with smaller computational cost independent of their dimensionality.

...read moreread less

Abstract: To be able to represent the whole hierarchical phrase structure, 10 types of Chinese chunks are defined. The paper presents a method of Chinese shallow Paring based on Support Vector machines (SVMs). Conventional recognition techniques based on Machine Learning have difficulty in selecting useful features as well as finding appropriate combination of selected features. SVMs can automatically focus on useful features and robustly handle a large feature set to develop models that maximize their generalizability. On the other hand, it is well known that SVMs achieve high generalization of very high dimensional feature space. Furthermore, by introducing the Kernel principle, SVMs can carry out the training in high-dimensional space with smaller computational cost independent of their dimensionality. The experiments produced promising results.

...read moreread less

A Finite-State Approach to Shallow Parsing and

[...]

Frank Henrik Muller

01 Jan 2007