scispace - formally typeset
Search or ask a question

Showing papers on "Shallow parsing published in 2011"


Journal ArticleDOI
TL;DR: This work proposes the first integral solution for the automatic extraction of DDI from biomedical texts using a hybrid linguistic approach that combines shallow parsing and syntactic simplification with pattern matching.
Abstract: A drug-drug interaction (DDI) occurs when one drug influences the level or activity of another drug. The increasing volume of the scientific literature overwhelms health care professionals trying to be kept up-to-date with all published studies on DDI. This paper describes a hybrid linguistic approach to DDI extraction that combines shallow parsing and syntactic simplification with pattern matching. Appositions and coordinate structures are interpreted based on shallow syntactic parsing provided by the UMLS MetaMap tool (MMTx). Subsequently, complex and compound sentences are broken down into clauses from which simple sentences are generated by a set of simplification rules. A pharmacist defined a set of domain-specific lexical patterns to capture the most common expressions of DDI in texts. These lexical patterns are matched with the generated sentences in order to extract DDIs. We have performed different experiments to analyze the performance of the different processes. The lexical patterns achieve a reasonable precision (67.30%), but very low recall (14.07%). The inclusion of appositions and coordinate structures helps to improve the recall (25.70%), however, precision is lower (48.69%). The detection of clauses does not improve the performance. Information Extraction (IE) techniques can provide an interesting way of reducing the time spent by health care professionals on reviewing the literature. Nevertheless, no approach has been carried out to extract DDI from texts. To the best of our knowledge, this work proposes the first integral solution for the automatic extraction of DDI from biomedical texts.

81 citations


Journal Article
TL;DR: A simple, developmentally motivated computational model that learns to comprehend and produce language when exposed to child-directed speech and uses backward transitional probabilities to create an inventory of ‘chunks’ consisting of one or more words.

53 citations


Journal ArticleDOI
TL;DR: This study is the first to compare the performance of the whole chunking pipeline, and to combine different existing chunking systems, and OpenNLP scored best both in performance and usability.

52 citations


Book ChapterDOI
26 Nov 2011
TL;DR: Despite being a pioneering effort in handling negation for the sentiment analysis of the Urdu text, the results of experimentation are quit encouraging and the main contribution of the research is to deal with a morphologically rich, and resource poor language.
Abstract: The paper investigates and proposes the treatment of the effect of the phrase-level negation on the sentiment analysis of the Urdu text based reviews. The negation acts as the valence shifter and flips or switches the inherent sentiments of the subjective terms in the opinionated sentences. The presented approach focuses on the subjective phrases called the SentiUnits, which are made by the subjective terms (adjectives), their modifiers, conjunctions, and the negation. The final effect of these phrases is computed according to the given model. The analyzer takes one sentence from the given review, extracts the constituent SentiUnits, computes their overall effect (polarity) and then calculates the final sentence polarity. Using this approach, the effect of negation is handled within these subjective phrases. The main contribution of the research is to deal with a morphologically rich, and resource poor language, and despite of being a pioneering effort in handling negation for the sentiment analysis of the Urdu text, the results of experimentation are quit encouraging.

16 citations


Journal Article
TL;DR: This paper proposes to integrate shallow parsing features and heuristic position information for modeling of the training process without introducing domain lexicon, and shows that after adding the proposed features, nearly all specifications of both conditional random fields and contrast model are improved, and the results of conditionalrandom fields are more efficient than that of the contrast model.

13 citations


13 Apr 2011
TL;DR: This thesis is the first work which addresses the problem of extracting drug-drug interactions from biomedical texts and proposes two different approximations, which show while the first approximation based on pattern matching achieves low performance, the approach based on kernel- methods achieves a performance comparable to those obtained by approaches which carry out a similar task as the extraction of protein-protein interactions.
Abstract: A drug-drug interaction occurs when one drug influences the level or activity of another drug. The detection of drug interactions is an important research area in patient safety since these interactions can become very dangerous and increase health care costs. Although there are different databases supporting health care professionals in the detection of drug interactions, this kind of resource is rarely complete. Drug interactions are frequently reported in journals of clinical pharmacology, making medical literature the most effective source for the detection of drug interactions. However, the increasing volume of the literature overwhelms health care professionals trying to keep an up-to-date collection of all reported drug-drug interactions. The development of automatic methods for collecting, maintaining and interpreting this information is crucial to achieving a real improvement in their early detection. Information Extraction techniques can provide an interesting way to reduce the time spent by health care professionals on reviewing the literature. Nevertheless, only a few approaches have tackled the extraction of drug-drug interactions. In this thesis, we have conducted a detailed study about various information extraction techniques applied to biomedical domain. Based on this study, we have proposed two different approximations for the extraction of drug-drug interactions from texts. The first approximation proposes a hybrid approach, which combines shallow parsing and pattern matching to extract relations between drugs from biomedical texts. The second approximation is based on a supervised machine learning approach, in particular, kernel methods. In addition, we have created and annotated the first corpus, DrugDDI, annotated with drug-drug interactions, which allow us to evaluate and compare both approximations. We think the DrugDDI corpus is an important contribution because it could encourage other research groups to investigate in this problem. To the best of our knowledge, the DrugDDI corpus is the only available corpus annotated for drug-drug interactions and this thesis is the first work which addresses the problem of extracting drug-drug interactions from biomedical texts. We have also defined three auxiliary processes to provide crucial information, which will be used by the aforementioned approximations. These auxiliary tasks are as follows: (1) a process for text analysis based on the UMLS MetaMap Transfer tool (MMTx) to provide shallow syntactic and semantic information from texts, (2) a process for drug name recognition and classification, and (3) a process for drug anaphora resolution. Finally, we have developed a pipeline prototype which integrates the different auxiliary processes. The pipeline architecture allows us to easily integrate these modules with each of the approaches proposed in this thesis: pattern-matching or kernels. Several experiments were performed on the DrugDDI corpus. They show while the first approximation based on pattern matching achieves low performance, the approach based on kernel-methods achieves a performance comparable to those obtained by approaches which carry out a similar task as the extraction of protein-protein interactions.

13 citations


01 Jan 2011
TL;DR: The problem of shallow parsing of Polish, most specifically — chunking is discussed and some theoretical issues related to chunking of Polish texts are discussed and chunk annotation guidelines are proposed.
Abstract: This paper discusses the problem of shallow parsing of Polish, most specifically — chunking. We discuss some theoretical issues related to chunking of Polish texts and propose our chunk annotation guidelines. In the second part of the paper we present initial results of using Machine Learning algorithms to train a working chunker for the proposed chunk types.

9 citations


Book ChapterDOI
01 Jan 2011
TL;DR: Cross-language evaluation shows that, despite the inherent errors and the challenges posed by the analysis of large amounts of unrestricted text, deep parsing contributes to a significant increase in performance.
Abstract: In this chapter—the core of the book—we present and evaluate our methodology for collocation extraction based on deep syntactic parsing. First, a closer look at previous work which made use of parsed text for collocation extraction will reveal that the aim of fully-fledged syntax-based extraction was far from realized in these efforts due, primarily, to the insufficient robustness, precision, or coverage of the parsers used, as well as to the small number of syntactic configurations taken into account. Our work addresses these deficiencies with a generic extraction procedure that relies on a large-scale multilingual parsing system. After describing the system and extraction method, we focus on the contrastive evaluation of the method against the sliding window method, a standard syntax-free method based on the linear proximity of words. Cross-language evaluation shows that, despite the inherent errors and the challenges posed by the analysis of large amounts of unrestricted text, deep parsing contributes to a significant increase in performance. A detailed qualitative analysis of the results, including a case-study comparison, allows an assessment of the relative strengths and weaknesses of the two methods to be made. Following the qualitative comparison, a brief comparison of the current system with systems based on shallow parsing is presented.

9 citations


01 Jan 2011
TL;DR: This article proposed to integrate shallow parsing features and heuristic position information for modeling of the training process without introducing domain lexicon to improve the performance of opinion targets extraction, and the experiment results show that after adding the proposed features, nearly all specifications of both conditional random fields and contrast model are improved.
Abstract: With the rapid development of the world wide web, more and more common users express their opinions on the web and many researchers pay more attentions to sentiment analysis. Fine-grained sentiment analysis on sentence level is very important. The extraction of opinion targets from opinion sentence is the key issue to sentence level of sentiment analysis. To improve the performance of opinion targets extraction, this paper proposes to integrate shallow parsing features and heuristic position information for modeling of the training process without introducing domain lexicon. The experiment results show that after adding the proposed features, nearly all specifications of both conditional random fields and contrast model are improved, and the results of conditional random fields are more efficient than that of the contrast model. Meanwhile, compared with the best results of the 2008 Chinese opinion analysis evaluation, the F measures of conditional random fields are 5 % higher than the maximum value.

8 citations


Proceedings ArticleDOI
26 Oct 2011
TL;DR: This work proposes a new method based on shallow parsing with rules that generates rules according to the syntactic features of English texts, such as the tense of verbs, the usages of modal verbs and so on, to meet the needs of users for accessing to updated information of the developing events quickly and effectively.
Abstract: Traditional text information extraction methods mainly act on static documents and are difficult to reflect the dynamic evolvement of information update on the web. To address this challenge, this work proposes a new method based on shallow parsing with rules. The rules are generated according to the syntactic features of English texts, such as the tense of verbs, the usages of modal verbs and so on. The latest novel information in English news texts is extracted correctly, to meet the needs of users for accessing to updated information of the developing events quickly and effectively. Performance results show the improvement of the proposed scheme in this work.

5 citations


Proceedings Article
01 Sep 2011
TL;DR: This paper investigates the contribution of fully unsupervised part-of-speech induction to a common natural language processing task and demonstrates a great potential of POS induction for shallow parsing which could be applied to resource-scarce languages.
Abstract: Natural language processing tasks often rely on part-of-speech (POS) tagging as a preprocessing step. However it is not clear how the absence of any part-of-speech tagger should hamper the development of other natural language processing tools. In this paper we investigate the contribution of fully unsupervised part-of-speech induction to a common natural language processing task. We focus on the supervised English shallow parsing task and compare systems relying either on POS induction, on POS tagging, or on lexical features only as a baseline. Our experiments on the English CoNLL'2000 dataset show a significant benefit from POS induction over the baseline, with performances close to those obtained with a traditional POS tagger. Results demonstrate a great potential of POS induction for shallow parsing which could be applied to resource-scarce languages.

Journal Article
Sui Zhifang1
TL;DR: In shallow parsing stage, this paper makes use of word formation to get fake head morpheme information of the target verb, which alleviates the problem of data sparseness, and imporves the performance of the parser with the F-score up to 0.93.
Abstract: Semantic role labeling(SRL)is an important way to get semantic information.Many existing systems forSRL make use of full syntactic parses.But due to the low performance of the existing Chinese parser,the performance of labeling based on the full syntactic parses is still not satisfactory.This paper realizes SRL methods based on shallow parsing.In shallow parsing stage,this paper makes use of word formation to get fake head morpheme information,which alleviates the problem of data sparseness,and imporves the performance of the parser with the F-score up to 0.93.In the stage of semantic role labeling,this paper applies word formation to get morpheme information of the target verb,which describes the structure of word in fine granualrity,and provides more information for semantic role labeling.In addition,this paper also proposes a coarse frame feature as an approximation of the sub-categorization information existing full syntactic parsing.F-score of this semantic role labeling system has reached 0.74,a significant improvements over the best reported SRL performance(0.71) in the literature.

01 Jun 2011
TL;DR: The combined deep and shallow parsing approach with Head-driven Phrase Structured Grammars, the inference process is introduced and it is shown how background knowledge is integrated into the logical inferences to increase the extent, quality, and accuracy of the content extraction.
Abstract: : Written information for military purposes is available in abundance. Documents are written in many languages. The question is how we can automate the content extraction of these documents. One possible approach is based on shallow parsing (information extraction) with application specific combination of analysis results. One example of this, the ZENON research system, does a partial content analysis of some English, Dari, and Tajik texts. Another principal approach for content extraction is based on a combination of deep and shallow parsing with logical inferences on the analysis results. In the project "Multilingual content analysis with semantic inference on military relevant texts" (mIE) we followed the second approach. In this paper, we present the results of the mIE project. First, we briefly contrast the ZENON project to the mIE project. In the main part of the paper, the mIE project is presented. After explaining the combined deep and shallow parsing approach with Head-driven Phrase Structured Grammars, the inference process is introduced. Then we show how background knowledge (WordNet, YAGO) is integrated into the logical inferences to increase the extent, quality, and accuracy of the content extraction. The prototype also is presented. The presentation includes briefing charts.

Journal ArticleDOI
TL;DR: An approach for web search results clustering based on a phrase based clustering algorithm Known as Optimized Snippet Flat Clustering (OSFC) is proposed, an alternative to a single ordered result of search engines.
Abstract: Information Retrieval plays a vital role in our daily activities and its most prominent role marked in search engines. Retrieval of the relevant natural language text document is of more challenge. Typically, search engines are low precision in response to a query, retrieving lots of useless web pages, and missing some other important ones. In this paper, we present linguistic phenomena of NLP using shallow parsing and Chunking to extract the Noun Phrases. These noun phrases are used as key phrases to rank the documents (typically a list of titles and snippets returned by a certain Web search engine). Organizing Web search results in to clusters facilitates user‟s quick browsing through search results. Traditional clustering techniques are inadequate since they don't generate clusters with highly readable names. Here, we also proposed an approach for web search results clustering based on a phrase based clustering algorithm Known as Optimized Snippet Flat Clustering (OSFC). It is an alternative to a single ordered result of search engines. This approach presents a list of clusters to the user. Experimental results verify our method's feasibility and effectiveness.

Journal Article
TL;DR: This paper proposes a method for shallow parsing on the basis of CRF and transformation-based error-driven learning and outperforms the single CRF-based approach in shallow parsing.
Abstract: This paper proposes a method for shallow parsing on the basis of CRF and transformation-based error-driven learning.The method is applied to Penn Chinese Treebank and gets a good performance of chunking identification.First,CRF model is used to identify chunks to acquire candidate transformation rules by error-driven learning.Then,an evaluation function is used to filter candidate transformation rules.And last,transformation rules are used to revise the chunking results of CRF.The experimental results show that this approach is effective,and outperforms the single CRF-based approach in shallow parsing.Precision,recall and F-values are improved respectively.

Book ChapterDOI
15 Sep 2011
TL;DR: For the first time, the class imbalance problem concerning Modern Greek syntactically annotated data is successfully addressed and the methodology can be adjusted to deal with other languages with only minor modifications.
Abstract: The present work aims to create a shallow parser for Modern Greek subject/object detection, using machine learning techniques. The parser relies on limited resources. Experiments with equivalent input and the same learning techniques were conducted for English, as well, proving that the methodology can be adjusted to deal with other languages with only minor modifications. For the first time, the class imbalance problem concerning Modern Greek syntactically annotated data is successfully addressed.