scispace - formally typeset
Search or ask a question
Topic

Shallow parsing

About: Shallow parsing is a research topic. Over the lifetime, 397 publications have been published within this topic receiving 10211 citations.


Papers
More filters
Proceedings ArticleDOI
01 Jun 2016
TL;DR: The problem of shallow parsing of Hindi-English code-mixed social media text (CSMT) has been addressed, and a language identifier, a normalizer, a part-of-speech tagger and a shallow parser are developed.

39 citations

Proceedings ArticleDOI
07 Jul 2003
TL;DR: A finite state approach that integrates a phrasal verb expert lexicon between shallow parsing and deep parsing to handle morpho-syntactic interaction is presented.
Abstract: Phrasal Verbs are an important feature of the English language. Properly identifying them provides the basis for an English parser to decode the related structures. Phrasal verbs have been a challenge to Natural Language Processing (NLP) because they sit at the borderline between lexicon and syntax. Traditional NLP frameworks that separate the lexicon module from the parser make it difficult to handle this problem properly. This paper presents a finite state approach that integrates a phrasal verb expert lexicon between shallow parsing and deep parsing to handle morpho-syntactic interaction. With precision/recall combined performance benchmarked consistently at 95.8%-97.5%, the Phrasal Verb identification problem has basically been solved with the presented method.

38 citations

01 Jan 2001
TL;DR: In this article, the authors present a system called Acromed which finds acronym-meaning pairs as part of a set of information extraction tools designed for processing and extracting data from abstracts in the Medline database.
Abstract: Acronyms are widely used in biomedical and other technical texts. Understanding their meaning constitutes an important problem in the automatic extraction and mining of information from text. Moreover, an even harder problem is sense disambiguation of acronyms; that is, where a single acronym, termed a polynym, has a multiplicity of meanings, a common occurrence in the biomedical literature. In such cases, it is necessary to identify the correct corresponding sense for the polynym, which is often not directly specified in the text. Here we present a system called Acromed which finds acronym-meaning pairs as part of a set of information extraction tools designed for processing and extracting data from abstracts in the Medline database. Our strategy for finding acronym-meaning pairs differs from previous automated acronym extraction methods by incorporating shallow parsing of the text into the acronym recognition algorithm. The performance of our system has been tested with a highly diverse set of Medline texts, giving the highest results for precision and recall, thus far in the literature. We then present Polyfind, an algorithm for disambiguating polynyms, which uses a vector space model. Our disambiguation tests produced 97.62% accuracy in one test (on acronyms) and 86.6% accuracy in another (on aliases).

38 citations

Journal Article
TL;DR: Three data-driven publicly available part-of-speech taggers are applied to shallow parsing of Swedish texts, and special attention is directed to the taggers' sensitivity to different types of linguistic information included in learning, as well as their sensitivity to the size and the various types of training data sets.
Abstract: Three data-driven publicly available part-of-speech taggers are applied to shallow parsing of Swedish texts. The phrase structure is represented by nine types of phrases in a hierarchical structure containing labels for every constituent type the token belongs to in the parse tree. The encoding is based on the concatenation of the phrase tags on the path from lowest to higher nodes. Various linguistic features are used in learning; the taggers are trained on the basis of lexical information only, part-of-speech only, and a combination of both, to predict the phrase structure of the tokens with or without part-of-speech. Special attention is directed to the taggers' sensitivity to different types of linguistic information included in learning, as well as the taggers' sensitivity to the size and the various types of training data sets. The method can be easily transferred to other languages.

38 citations

Journal ArticleDOI
TL;DR: The use of linguistic analysis-based relations improves accuracy of LBD without overly damaging coverage, as well as replicating existing discoveries and the “time slicing” approach.

38 citations


Network Information
Related Topics (5)
Machine translation
22.1K papers, 574.4K citations
81% related
Natural language
31.1K papers, 806.8K citations
79% related
Language model
17.5K papers, 545K citations
79% related
Parsing
21.5K papers, 545.4K citations
79% related
Query language
17.2K papers, 496.2K citations
74% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20217
202012
20196
20185
201711
201611