scispace - formally typeset
Search or ask a question
Topic

Shallow parsing

About: Shallow parsing is a research topic. Over the lifetime, 397 publications have been published within this topic receiving 10211 citations.


Papers
More filters
Proceedings ArticleDOI
06 Aug 2009
TL;DR: This paper evaluates SRL methods that take partial parses as inputs and implements SRL systems which cast SRL as the classification of syntactic chunks with IOB2 representation for semantic roles (i.e. semantic chunks).
Abstract: Most existing systems for Chinese Semantic Role Labeling (SRL) make use of full syntactic parses In this paper, we evaluate SRL methods that take partial parses as inputs We first extend the study on Chinese shallow parsing presented in (Chen et al, 2006) by raising a set of additional features On the basis of our shallow parser, we implement SRL systems which cast SRL as the classification of syntactic chunks with IOB2 representation for semantic roles (ie semantic chunks) Two labeling strategies are presented: 1) directly tagging semantic chunks in one-stage, and 2) identifying argument boundaries as a chunking task and labeling their semantic types as a classification task Lor both methods, we present encouraging results, achieving significant improvements over the best reported SRL performance in the literature Additionally, we put forward a rule-based algorithm to automatically acquire Chinese verb formation, which is empirically shown to enhance SRL

35 citations

Journal ArticleDOI
TL;DR: A grammatically motivated, sentiment classification model, applied on a morphologically rich language: Urdu, achieves the state of the art performance in the sentiment analysis of the Urdu text.
Abstract: This paper presents, a grammatically motivated, sentiment classification model, applied on a morphologically rich language: Urdu. The morphological complexity and flexibility in grammatical rules of this language require an improved or altogether different approach. We emphasize on the identification of the SentiUnits, rather than, the subjective words in the given text. SentiUnits are the sentiment carrier expressions, which reveal the inherent sentiments of the sentence for a specific target. The targets are the noun phrases for which an opinion is made. The system extracts SentiUnits and the target expressions through the shallow parsing based chunking. The dependency parsing algorithm creates associations between these extracted expressions. For our system, we develop sentiment-annotated lexicon of Urdu words. Each entry of the lexicon is marked with its orientation (positive or negative) and the intensity (force of orientation) score. For the evaluation of the system, two corpora of reviews, from the domains of movies and electronic appliances are collected. The results of the experimentation show that, we achieve the state of the art performance in the sentiment analysis of the Urdu text.

34 citations

Journal Article
TL;DR: In this article, a corpus of newspaper articles about national current affairs by different journalists from the Belgian newspaper De Standaard was used to predict authorship of unseen documents using machine learning methods (TiMBL and the WEKA software package).
Abstract: Current advances in shallow parsing and machine learning allow us to use results from these fields in a methodology for Authorship Attribution. We report on experiments with a corpus that consists of newspaper articles about national current affairs by different journalists from the Belgian newspaper De Standaard. Because the documents are in a similar genre, register, and range of topics, token-based (e.g., sentence length) and lexical features (e.g., vocabulary richness) can be kept roughly constant over the different authors. This allows us to focus on the use of syntax-based features as possible predictors for an author’s style, as well as on those token-based features that are predictive to author style more than to topic or register. These style characteristics are not under the author’s conscious control and therefore good clues for Authorship Attribution. Machine Learning methods (TiMBL and the WEKA software package) are used to select informative combinations of syntactic, token-based and lexical features and to predict authorship of unseen documents. The combination of these features can be considered an implicit profile that characterizes the style of an author.

34 citations

Proceedings ArticleDOI
19 Jun 2000
TL;DR: A case study based on part of NASA's specification of the Node Control Software of the International Space Station is described, and the authors apply to it their method of checking properties on models obtained by shallow parsing of natural language requirements.
Abstract: The authors report on their experiences of using lightweight formal methods for the partial validation of natural language (NL) requirements documents They describe a case study based on part of NASA's specification of the Node Control Software of the International Space Station, and apply to it their method of checking properties on models obtained by shallow parsing of natural language requirements These experiences support the position that it is feasible and useful to perform automated analysis of requirements expressed in natural language Indeed the authors identified a number of errors in their case study that were also independently discovered and corrected by NASA's IV and V Facility in a subsequent version of the same document The paper describes the techniques used, the errors found, and reflects on the lessons learned

33 citations

Proceedings Article
02 Jun 2010
TL;DR: This work uses a classification method to aid human annotation of output parses and shows that knowledge about multiword expressions leads to an increase of between 7.5% and 9.
Abstract: There is significant evidence in the literature that integrating knowledge about multiword expressions can improve shallow parsing accuracy. We present an experimental study to quantify this improvement, focusing on compound nominals, proper names and adjective-noun constructions. The evaluation set of multiword expressions is derived from Word-Net and the textual data are downloaded from the web. We use a classification method to aid human annotation of output parses. This method allows us to conduct experiments on a large dataset of unannotated data. Experiments show that knowledge about multiword expressions leads to an increase of between 7.5% and 9.5% in accuracy of shallow parsing in sentences containing these multiword expressions.

32 citations


Network Information
Related Topics (5)
Machine translation
22.1K papers, 574.4K citations
81% related
Natural language
31.1K papers, 806.8K citations
79% related
Language model
17.5K papers, 545K citations
79% related
Parsing
21.5K papers, 545.4K citations
79% related
Query language
17.2K papers, 496.2K citations
74% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20217
202012
20196
20185
201711
201611