Showing papers on "Shallow parsing published in 2013"

PDF

Open Access

Book Chapter•DOI•

Automatic Extraction of Events from Open Source Text for Predictive Forecasting

[...]

Elizabeth Boschee¹, Premkumar Natarajan¹, Ralph Weischedel¹•Institutions (1)

01 Jan 2013

TL;DR: This chapter explores an alternative to event extraction based on BBN SERIFTM, and BBN OnTopicTM, two state-of-the-art statistical natural language processing engines, and empirically compares their effectiveness on five dimensions.

...read moreread less

Abstract: Automated analysis of news reports is a significant empowering technology for predictive models of political instability. To date, the standard approach to this analytic task has been embodied in systems such as KEDS/TABARI [1], which use manually-generated rules and shallow parsing techniques to identify events and their participants in text. In this chapter we explore an alternative to event extraction based on BBN SERIFTM, and BBN OnTopicTM, two state-of-the-art statistical natural language processing engines. We empirically compare this new approach to existing event extraction techniques on five dimensions: (1) Accuracy: when an event is reported by the system, how often is it correct? (2) Coverage: how many events are correctly reported by the system? (3) Filtering of historical events: how well are historical events (e.g. 9/11) correctly filtered out of the current event data stream? (4) Topic-based event filtering: how well do systems filter out red herrings based on document topic, such as sports documents mentioning “clashes” between two countries on the playing field? (5) Domain shift: how well do event extraction models perform on data originating from diverse sources? In all dimensions we show significant improvement to the state-of-the-art by applying statistical natural language processing techniques. It is our hope that these results will lead to greater acceptance of automated coding by creators and consumers of social science models that depend on event data and provide a new way to improve the accuracy of those predictive models.

...read moreread less

50 citations

Book Chapter•DOI•

Fextor: A Feature Extraction Framework for Natural Language Processing: A Case Study in Word Sense Disambiguation, Relation Recognition and Anaphora Resolution

[...]

Bartosz Broda¹, Paweł Kędzia¹, Michał Marcińczuk¹, Adam Radziszewski¹, Radosław Ramocki¹, Adam Wardyński¹ - Show less +2 more•Institutions (1)

Wrocław University of Technology¹

01 Jan 2013

TL;DR: This paper presents an integrated feature extraction framework for Natural Language Processing that removes wasteful redundancy and helps in rapid prototyping.

...read moreread less

Abstract: Feature extraction from text corpora is an important step in Natural Language Processing (NLP), especially for Machine Learning (ML) techniques. Various NLP tasks have many common steps, e.g. low level act of reading a corpus and obtaining text windows from it. Some high-level processing steps might also be shared, e.g. testing for morpho-syntactic constraints between words. An integrated feature extraction framework removes wasteful redundancy and helps in rapid prototyping.

...read moreread less

20 citations

Proceedings Article•DOI•

Shallow parsing for recognizing threats in Dutch tweets

[...]

Nelleke Oostdijk¹, Hans van Halteren¹•Institutions (1)

Radboud University Nijmegen¹

25 Aug 2013

TL;DR: A new shallow parsing mechanism is implemented which is driven by handcrafted rules and shows some clear avenues for further improvement in the error analysis.

...read moreread less

Abstract: In this paper, we investigate the recognition of threats in Dutch tweets. As tweets often display irregular grammatical form and deviant orthography, analysis by standard means is problematic. Therefore, we have implemented a new shallow parsing mechanism which is driven by handcrafted rules. Experimental results are encouraging, with an F-measure of about 40% on a random sample of Dutch tweets. Moreover, the error analysis shows some clear avenues for further improvement.

...read moreread less

11 citations

Patent•

Automatic sentence evaluation device using shallow parser to automatically evaluate sentence, and error detection apparatus and method of the same

[...]

Kim Seung Hwan¹, Kim Dong Nam¹, Lee Eun Sook¹, Sung Kim¹•Institutions (1)

SK Telecom¹

11 Nov 2013

TL;DR: In this article, a simple grammatical error and an error in sentence structure are detected by generating a string of parts of speech using n-grams for a composed input sentence and parsing the generated string of speech on the basis of a rule (shallow parsing) defined according to a connective relationship between adjacent parts-of-speech, and a corrected draft is proposed for the detected errors to increase accuracy of sentence evaluation.

...read moreread less

Abstract: An automatic sentence evaluating device using a shallow parser. A simple grammatical error and an error in sentence structure are detected by generating a string of parts of speech using n-gram for a composed input sentence and parsing the generated string of parts of speech on the basis of a rule (shallow parsing) defined according to a connective relationship between adjacent parts of speech, and a corrected draft is proposed for the detected errors to thereby increase accuracy of sentence evaluation, and an error detection apparatus and a method for the same.

...read moreread less

9 citations

Proceedings Article•

Mining Fine-grained Opinion Expressions with Shallow Parsing

[...]

Sucheta Ghosh¹, Sara Tonelli², Richard Johansson³•Institutions (3)

Trinity College, Dublin¹, fondazione bruno kessler², University of Gothenburg³

01 Sep 2013

TL;DR: This paper selects an intersection set of Wall Street Journal documents that is included both in the Penn Discourse Tree Bank (PDTB) and in the Multi-Perspective Question Answering (MPQA) corpus in order to explore the usefulness of discourselevel structure to facilitate the extraction of fine-grained opinion expressions.

...read moreread less

Abstract: Opinion analysis deals with public opinions and trends, but subjective language is highly ambiguous. In this paper, we follow a simple data-driven technique to learn fine-grained opinions. We select an intersection set of Wall Street Journal documents that is included both in the Penn Discourse Tree Bank (PDTB) and in the Multi-Perspective Question Answering (MPQA) corpus. This is done in order to explore the usefulness of discourselevel structure to facilitate the extraction of fine-grained opinion expressions. Here we perform shallow parsing of MPQA expressions with connective based discourse structure, and then also with Named Entities (NE) and some syntax features using conditional random fields; the latter feature set is basically a collection of NEs and a bundle of features that is proved to be useful in a shallow discourse parsing task. We found that both of the feature-sets are useful to improve our baseline at different levels of this fine-grained opinion expression mining task.

...read moreread less

6 citations

Proceedings Article•

A one-pass valency-oriented chunker for German

[...]

Adrien Barbaresi

07 Dec 2013

TL;DR: In a pattern-based matching operation, the transducer described here consists of POS-tags using regular expressions that take advantage of the characteristics of German grammar to find linguistically relevant phrases with a good precision.

...read moreread less

Abstract: Non-finite state parsers provide fine-grained information. However, they are computationally demanding. Therefore, it is interesting to see how far a shallow parsing approach is able to go. In a pattern-based matching operation, the transducer described here consists of POS-tags using regular expressions that take advantage of the characteristics of German grammar. The process aims at finding linguistically relevant phrases with a good precision, which enables in turn an estimation of the actual valency of a given verb. The chunker reads its input exactly once instead of using cascades, which greatly benefits computational efficiency. This finite-state chunking approach does not return a tree structure, but rather yields various kinds of linguistic information useful to the language researcher. Possible applications include simulation of text comprehension on the syntactical level, creation of selective benchmarks and failure analysis.

...read moreread less

5 citations

Automatic Identification of Concepts and Conceptual relations from Pa- tents Using Machine Learning Methods

[...]

Pattabhi R. K. Rao, Sobha Lalitha Devi, Paolo Rosso

01 Jan 2013

TL;DR: A machine learning approach to automatically extract concepts and the conceptual relations towards creation of Conceptual Graphs (CGs) from patent documents using shallow parser and NER and machine learning technique.

...read moreread less

Abstract: This paper presents a machine learning approach to automatically extract concepts and the conceptual relations towards creation of Conceptual Graphs (CGs) from patent documents using shallow parser and NER. The main challenge in the creation of conceptual graphs from the natural language texts is the automatic identification of concepts and conceptual relations. The texts analyzed in this work are patent documents, focused mainly on the claim‟s section (Claim) of the documents. The task of automatically identifying the concept and conceptual relation becomes difficult due to the complexities in the writing style of these documents as they are technical as well as legal. The analysis we have done shows that the general in-depth parsers available in the open domain fail to parse the „claims section‟ sentences in patent documents. The failure of in-depth parsers led us, to develop a methodology to extract CGs using other resources. Thus in the present work we came up with a methodology which uses shallow parsing, NER and machine learning technique for extracting concepts and conceptual relationships from sentences in the claim/novelty section of patent documents. The results obtained from our experiments are encouraging and are discussed in detail in this paper. We have obtained a precision of 73.2 % and a recall of 68.3%.

...read moreread less

4 citations

NLP for Endangered Languages: Morphology Analysis, Translation Support and Shallow Parsing of Ainu Language

[...]

Michal Ptaszynski¹, Mukaichi Kazuki², Yoshio Momouchi²•Institutions (2)

Kitami Institute of Technology¹, Hokkai Gakuen University²

01 Jan 2013

TL;DR: Recent improvements to the system as well as other enhancements made with an aim to help Ainu language researchers are described, including enhanced the POS tagger with analysis of morphological information.

...read moreread less

Abstract: This paper describes our research on computer processing of Ainu language with the use of various NLP techniques. Ainu is an endangered language close to extinction. At present linguists and anthropologists make a great effort to preserve the language by analyzing and understanding it. However, most of the work in this matter is done manually, which makes it an uphill task. Previously we have presented POST-AL, a part-of-speech tagger for Ainu language. This paper describes recent improvements to the system as well as other enhancements made with an aim to help Ainu language researchers. In particular, we have enhanced the POS tagger with analysis of morphological information. We have also added a translation support tool for Ainu language translators and made a first step toward deeper syntactical analysis of Ainu language by creating a simple shallow parser.

...read moreread less

3 citations

Journal Article•

A Survey on Chinese Chunk Parsing

[...]

Huang Heyan¹•Institutions (1)

Shandong University of Technology¹

01 Jan 2013-Journal of Chinese information processing

TL;DR: This paper surveys the rich researches on chunking in several aspects: the definition and classification of chunks, the chunk identification, the chunks annotation and evaluation, and the internal relationship in chunks.

...read moreread less

Abstract: Chunking,as a typical shallow parsing,serves for many language information processing system for their demands on syntactic information,as well as a bridge between the lexical analysis,syntactic parsing and semantic parsing.This paper surveys the rich researches on chunking in several aspects: the definition and classification of chunks,the chunks identification,the chunks annotation and evaluation,and the internal relationship in chunks.Finally,this paper draws conclusions and discusses the future work.

...read moreread less

3 citations

Journal Article•DOI•

Using AI to Make Social Networks More Useful

[...]

George Lawton

01 Jan 2013-IEEE Intelligent Systems

2 citations

Journal Article•DOI•

Accounting for Contiguous Multiword Expressions in Shallow Parsing

[...]

Matthieu Constant, Olivier Blanc, Patrick Watrin

01 Apr 2013-The Prague Bulletin of Mathematical Linguistics

TL;DR: Different strategies to improve a superchunker based on Conditional Random Fields by combining it with a finite-state symbolic super-chunker driven by lexical and grammatical resources are presented.

...read moreread less

Abstract: In this paper, we focus on chunking including contiguous multiword expression recognition, namely super-chunking. In particular, we present different strategies to improve a superchunker based on Conditional Random Fields by combining it with a finite-state symbolic super-chunker driven by lexical and grammatical resources. We display a substantial gain of 7.6 points in terms of overall accuracy.

...read moreread less

Dissertation•

Redefining Urdu Morphology And Grammar For The Development Of An Integrated Sentiment Analysis Framework

[...]

Afraz Zahra Syed

01 Jan 2013

TL;DR: This dissertation presents a grammatically motivated, sentiment classification framework to handle these distinctive features of the Urdu language, and uses the sentiment-annotated lexicon based approach.

...read moreread less

Abstract: The rise of social networking sites and blogs has simulated a bull market in personal opinion; consumer recommendations, product reviews, ratings, and other types of online expressions. For computational linguistic researchers, this fast-growing heap of information has opened an exciting research frontier, referred as, the Sentiment Analysis (SA).For English, this area is under consideration from last decade.But, other major languages, like Urdu, are totally overlooked by the research community.Urdu is a morphologically rich and recourse poor language.The distinctive features, like, complex morphology, flexible grammar rules, context sensitive orthography and free word order, make the Urdu language processing a challenging problem domain. For the same reasons, sentiment analysis approaches and techniques developed for other well-explored languages are not workable for Urdu text.This dissertation presents a grammatically motivated, sentiment classification framework to handle these distinctive features of the Urdu language.The main research contributions are; to highlight the linguistic (orthography, grammar and morphology, etc.) as well as technical (parsing algorithm, lexicon, corpus, etc.) aspects of this multidimensional research problem, to explore Urdu morphological operations, grammar and orthographic rules, to redefine these operations and rules with respect to the requirements of sentiment analysis framework. The orthographical, morphological, grammatical and finally the conceptual details of the language are our target concerns. Additionally, our approach can help in the sentiment analysis of other languages, like Arabic, Persian, Hindi, Punjabi etc.The proposed framework emphasizes on the identification of the SentiUnits, rather than, the subjective words in the given text. SentiUnits are the sentiment carrier expressions, which reveal the inherent sentiments of the sentence for a specific target. The targets are the noun phrases for which an opinion is made.The system extracts SentiUnits and the target expressions through the shallow parsing based chunking.The dependency parsing algorithm creates associations between these extracted expressions. The framework uses the sentiment-annotated lexicon based approach. Each entry of the lexicon is marked with its orientation (positive or negative) and the intensity (force of orientation) score.The experimentation based evaluation of the system with a sentiment-annotated lexicon of Urdu words and two corpuses of reviews as test-beds, shows encouraging achievement in terms of accuracy, precision, recall and f-measure.

...read moreread less

Book Chapter•DOI•

Semi-supervised constituent grammar induction based on text chunking information

[...]

Jesus Santamaria, Lourdes Araujo

24 Mar 2013

TL;DR: This work has investigated how the results of a pattern-based unsupervised grammar induction system improve as data on new kind of phrases are added, leading to a significant improvement in performance.

...read moreread less

Abstract: There is a growing interest in unsupervised grammar induction, which does not require syntactic annotations, but provides less accurate results than the supervised approach. Aiming at improving the accuracy of the unsupervised approach, we have resorted to additional information, which can be obtained more easily. Shallow parsing or chunking identifies the sentence constituents (noun phrases, verb phrases, etc.), but without specifying their internal structure. There exist highly accurate systems to perform this task, and thus this information is available even for languages for which large syntactically annotated corpora are lacking. In this work we have investigated how the results of a pattern-based unsupervised grammar induction system improve as data on new kind of phrases are added, leading to a significant improvement in performance. We have analyzed the results for three different languages. We have also shown that the system is able to significantly improve the results of the unsupervised system using the chunks provided by automatic chunkers.

...read moreread less

Morphosyntactic disambiguation and shallow parsing in computational processing of Basque

[...]

Itziar Aduriz, Arantza Díaz de Ilarraza

05 Jun 2013

TL;DR: The improvements presented in this paper include the following: analyses of previously identified ambiguities in morphosyntax and in syntactic functions, their disambiguation, and finally, an outline of possible steps in terms of shallow parsing based on the results provided by the disambigsuation process.

...read moreread less

Abstract: Our goal in this article is to show the improvements in the computational treatment of Basque, and more specifically, in the areas of morphosyntactic disambiguation and shallow parsing The improvements presented in this paper include the following: analyses of previously identified ambiguities in morphosyntax and in syntactic functions, their disambiguation, and finally, an outline of possible steps in terms ofshallow parsing based on the results provided by the disambiguation process The work is part of the current research within the field of Natural Language Processing (NLP) in Basque, and more specifically, part of the work that is being done within the IXA group

...read moreread less

Proceedings Article•DOI•

Automatic extraction of abbreviation definitions based on general texts

[...]

Zhihua Zhou¹, Guang Chen¹•Institutions (1)

Beijing University of Posts and Telecommunications¹

23 Jul 2013

TL;DR: An abbreviation definition identification algorithm is proposed, which employs a variety of rules and incorporates shallow parsing of the text to identify the most probable abbre acronym definition from general texts.

...read moreread less

Abstract: The study of abbreviation identifications mostly is limited to the biomedical literature. The wide use of abbreviations in general texts, including web data and newswire data, requires us to process and extract the abbreviation definition. In this paper, we propose an abbreviation definition identification algorithm, which employs a variety of rules and incorporates shallow parsing of the text to identify the most probable abbreviation definition from general texts. The performance of our system was tested with data set provided by 2012 NIST1 TAC-KBP2, obtaining a performance of 94.2% recall and 95.5% precision.

...read moreread less

Journal Article•DOI•

Bres: extracting multiclass biomedical relations with semantic network

[...]

Lejun Gong¹, Ronggen Yang, Xiao Sun¹•Institutions (1)

Southeast University¹

10 Mar 2013-Biomedical Engineering: Applications, Basis and Communications

TL;DR: A text mining approach for multiclass biomedical relations based on predicate argument structure (PAS) and shallow parsing and the implementation of BRES, a text mining system, is implemented based on the proposed approach.

...read moreread less

Abstract: With an overwhelming amount of published biomedical research, the underlying biomedical knowledge is expanding at an exponential rate. This expansion makes it very difficult to find interested genetics knowledge. And therefore, there is an urgent need for developing text mining approaches to discover new knowledge from publications. This paper presents a text mining approach for multiclass biomedical relations based on predicate argument structure (PAS) and shallow parsing. The approach can mine explicit biomedical relations with semantic enrichment, and visualize relations with semantic network. It first identifies noun phrases based on shallow parsing, and then filters arguments from noun phrases via biomedical ontology dictionary. We have implemented BRES, a text mining system, based on our proposed approach. Our results obtained 67.7% F-measure, 62.5% precision and 73.8% recall for the test dataset. This also shows our proposed approach is promising for developing biomedical text mining technology. Highlights: • Mining multiclass biomedical relations; • Representing biomedical relations with semantic enrichment; • Visualizing relations by semantic network; • Extracting direct and indirect biomedical relations.

...read moreread less

Book Chapter•DOI•

A Study of Chinese Non-canonical VN Collocations

[...]

Qiong Wu¹•Institutions (1)

Wuhan University¹

10 May 2013

TL;DR: Wang et al. as mentioned in this paper focused on Chinese non-canonical VN collocations from the NLP perspective and made a classification of the Chinese noncanonical collocations, and then talked about their semantic features, and argued that, machine recognition of Chinese collocations should not only consider the semantic roles of the objects, but also the verbs.

...read moreread less

Abstract: This paper focuses on Chinese non-canonical VN collocations from the NLP perspective It first makes a classification of the Chinese non-canonical VN collocations, and then talks about their semantic features This paper argues that, machine recognition of Chinese non-canonical collocations should not only consider the semantic roles of the objects, but also the verbs Idioms and chunks should be put into the lexicon directly A flow chart for the machine recognition is offered at the end of this paper

...read moreread less

Book Chapter•DOI•

Shallow Parsing of Chinese Based on HMM Model

[...]

Zheng Wei-fa¹, Xie Wenliang¹•Institutions (1)

Guangdong University of Business Studies¹

01 Jan 2013

TL;DR: A new model for shallow parsing of Chinese is presented, which adopts Church theory and carries on Chinese phrases recognition based on HMM; improves the precision rate of sentences separation by improving the observance probabilities of HMM model and making use of the context information of the Chinese sentences.

...read moreread less

Abstract: Complete parsing is difficult to meet the need of precision and recall rate in Chinese. To address this problem, a new model for shallow parsing of Chinese is presented in this paper. We adopt Church theory and carry on Chinese phrases recognition based on HMM; improve the precision rate of sentences separation by improving the observance probabilities of HMM model and making use of the context information of the Chinese sentences. At the same time, by studying the rules of Chinese sentence, we extract some rules useful for ambiguity elimination. The experimental result indicates that the model based on HMM has high precision and recall rate.

...read moreread less

Journal Article•DOI•

Researching in Shallow Parsing Based on Function Invariance of Maximum Likelihood Estimation

[...]

Shuang Zhang, Shi Xiong Zhang

01 Jan 2013-Applied Mechanics and Materials

TL;DR: In this essay, some applied technology of shallow parsing is introduced and a new method of it is experimented.

...read moreread less

Abstract: Shallow parsing is a new strategy of language processing in the domain of natural language processing recently years It is not focus on the obtaining of the full parsing tree but requiring of the recognition of some simple composition of some structure It separated parsing into two subtasks: one is the recognition and analysis of chunks the other is the analysis of relationships among chunks In this essay, some applied technology of shallow parsing is introduced and a new method of it is experimented

...read moreread less

Journal Article•DOI•

Massive Test Paper Recognition Using SVM and Shallow Parsing

[...]

Dongmei Li, Yan Qin, Na Li, Guangxin Wang

01 Dec 2013-Journal of Applied Sciences