scispace - formally typeset
Search or ask a question
Topic

Shallow parsing

About: Shallow parsing is a research topic. Over the lifetime, 397 publications have been published within this topic receiving 10211 citations.


Papers
More filters
DOI
08 Aug 2010
TL;DR: The paper describes Aelred, a web application that demonstrates the use of language technology in the Google App Engine cloud computing environment and a range of linguistic annotations including part-of-speech tagging, shallow parsing, and word sense definitions from WordNet.
Abstract: The paper describes Aelred, a web application that demonstrates the use of language technology in the Google App Engine cloud computing environment. Aelred serves up English literary texts with optional concordances for any word and a range of linguistic annotations including part-of-speech tagging, shallow parsing, and word sense definitions from WordNet. Two alternative approaches are described. In the first approach, annotations are created offline and uploaded to the cloud datastore. In the second approach, annotations are created online within the cloud computing framework. In both cases standard HTML is generated with a template engine so that the annotations can be viewed in ordinary web browsers.

2 citations

Book ChapterDOI
25 Aug 2009
TL;DR: This paper introduces the strategy for adapting a rule based parser of written language to transcribed speech and gives a detailed analysis of the types of errors made by the parser while analyzing the corpus of disfluencies.
Abstract: This paper introduces our strategy for adapting a rule based parser of written language to transcribed speech. Special attention has been paid to disfluencies (repairs, repetitions and false starts). A Constraint Grammar based parser was used for shallow syntactic analysis of spoken Estonian. The modification of grammar and additional methods improved the recall from 97.5% to 97.6% and precision from 91.6% to 91.8%. Also, the paper gives a detailed analysis of the types of errors made by the parser while analyzing the corpus of disfluencies.

2 citations

Book ChapterDOI
25 Aug 2009
TL;DR: This work presents an alternate approach to shallow parsing of noun phrases for Slavic languages which follows the original Abney's principles and shows that continuous phrase chunking as well as shallow constituency parsing display evident drawbacks when faced with freer word order languages.
Abstract: Shallow parsing has been proposed as a means of arriving at practically useful structures while avoiding the difficulties of full syntactic analysis. According to Abney's principles, it is preferred to leave an ambiguity pending than to make a likely wrong decision. We show that continuous phrase chunking as well as shallow constituency parsing display evident drawbacks when faced with freer word order languages. Those drawbacks may lead to unnecessary data loss as a result of decisions forced by the formalism and therefore diminish practical value of shallow parsers for Slavic languages. We present an alternate approach to shallow parsing of noun phrases for Slavic languages which follows the original Abney's principles. The proposed approach to parsing is decomposed into several stages, some of which allow for marking discontinuous phrases.

2 citations

Proceedings ArticleDOI
17 Jan 2010
TL;DR: Although the program finds data of birth information with high precision and recall, this type of information extraction task seems to be negatively impacted by OCR errors.
Abstract: This paper presents the implementation and evaluation of a pattern-based program to extract date of birth information from OCR text. Although the program finds data of birth information with high precision and recall, this type of information extraction task seems to be negatively impacted by OCR errors.

2 citations

01 Jan 2010
TL;DR: In this paper, the authors proposed a hybrid approach, which combines shallow parsing and pattern matching to extract relations between drugs from biomedical texts, and the second approximation is based on a supervised machine learning approach, in particular, kernel methods.
Abstract: A drug-drug interaction occurs when one drug influences the level or activity of another drug. The detection of drug interactions is an important research area in patient safety since these interactions can become very dangerous and increase health care costs. Although there are different databases supporting health care professionals in the detection of drug interactions, this kind of resource is rarely complete. Drug interactions are frequently reported in journals of clinical pharmacology, making medical literature the most effective source for the detection of drug interactions. However, the increasing volume of the literature overwhelms health care professionals trying to keep an up-to-date collection of all reported drug-drug interactions. The development of automatic methods for collecting, maintaining and interpreting this information is crucial to achieving a real improvement in their early detection. Information Extraction techniques can provide an interesting way to reduce the time spent by health care professionals on reviewing the literature. Nevertheless, only a few approaches have tackled the extraction of drug-drug interactions. In this thesis, we have conducted a detailed study about various information extraction techniques applied to biomedical domain. Based on this study, we have proposed two different approximations for the extraction of drug-drug interactions from texts. The first approximation proposes a hybrid approach, which combines shallow parsing and pattern matching to extract relations between drugs from biomedical texts. The second approximation is based on a supervised machine learning approach, in particular, kernel methods. In addition, we have created and annotated the first corpus, DrugDDI, annotated with drug-drug interactions, which allow us to evaluate and compare both approximations. We think the DrugDDI corpus is an important contribution because it could encourage other research groups to investigate in this problem. To the best of our knowledge, the DrugDDI corpus is the only available corpus annotated for drug-drug interactions and this thesis is the first work which addresses the problem of extracting drug-drug interactions from biomedical texts. We have also defined three auxiliary processes to provide crucial information, which will be used by the aforementioned approximations. These auxiliary tasks are as follows: (1) a process for text analysis based on the UMLS MetaMap Transfer tool (MMTx) to provide shallow syntactic and semantic information from texts, (2) a process for drug name recognition and classification, and (3) a process for drug anaphora resolution. Finally, we have developed a pipeline prototype which integrates the different auxiliary processes. The pipeline architecture allows us to easily integrate these modules with each of the approaches proposed in this thesis: pattern-matching or kernels. Several experiments were performed on the DrugDDI corpus. They show while the first approximation based on pattern matching achieves low performance, the approach based on kernel-methods achieves a performance comparable to those obtained by approaches which carry out a similar task as the extraction of protein-protein interactions.

2 citations


Network Information
Related Topics (5)
Machine translation
22.1K papers, 574.4K citations
81% related
Natural language
31.1K papers, 806.8K citations
79% related
Language model
17.5K papers, 545K citations
79% related
Parsing
21.5K papers, 545.4K citations
79% related
Query language
17.2K papers, 496.2K citations
74% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20217
202012
20196
20185
201711
201611