scispace - formally typeset
Search or ask a question
Topic

Shallow parsing

About: Shallow parsing is a research topic. Over the lifetime, 397 publications have been published within this topic receiving 10211 citations.


Papers
More filters
Proceedings ArticleDOI
01 Nov 2015
TL;DR: This paper aims to develop a text chunker for Malayalam using Memory-Based Learning (MBL) approach and shows that the chunker demonstrated an accuracy of 97.14%.
Abstract: Text chunking consists of dividing a text into syntactically correlated parts of words. Given the words and their morphosyntactic class, a chunker will decide which words can be grouped as chunks. Malayalam is a free word order language and has relatively unrestricted phrase structures that make the problem of chunking quite challenging. This paper aims to develop a text chunker for Malayalam using Memory-Based Learning (MBL) approach. Memory-Based Learning is a machine learning methodology based on the idea that the direct reuse of examples using analogical reasoning is more suited for solving language processing problems than the application of rules extracted from those examples. The chunker was trained using the tool Memory-Based Tagger (MBT) with words and their POS tags as features. The chunker demonstrated an accuracy of 97.14%.

1 citations

Book ChapterDOI
16 Feb 2003
TL;DR: A new approach to the automatic semantic indexing of digital photographs based on the extraction of logic relations from their textual descriptions using an ontology for the domain of application is presented.
Abstract: In this paper we present a new approach to the automatic semantic indexing of digital photographs based on the extraction of logic relations from their textual descriptions. The method is based on shallow parsing and propositional analysis of the descriptions using an ontology for the domain of application. We describe the semantic representation formalism, the ontology, and the algorithms involved in the automatic derivation of semantic indexes from texts linked to images. The method has been integrated into the Scene of the Crime Information System, a crime management system for storing, indexing and retrieval of crime information.

1 citations

Proceedings ArticleDOI
Kamal Sarkar1
19 Dec 2014
TL;DR: This paper presents a new approach for automatically extracting key phrases from a Bengali document that uses lexical information and case markers for candidate key phrase identification and choosing the best items from the set of the candidates using a ranking method that combines the statistical features and the linguistic features for ranking the candidates.
Abstract: This paper presents a new approach for automatically extracting key phrases from a Bengali document. Our proposed approach presented in this paper has two important steps: (1) a shallow parsing based candidate key phrase identification that uses lexical information and case markers for candidate key phrase identification and (2) choosing the best items from the set of the candidates using a ranking method that combines the statistical features and the linguistic features for ranking the candidates. The feature set includes term frequency, position of the phrase's first occurrence, named entity information and lexical information. The proposed system has been tested on a collection of Bengali news documents. The experimental results show that it performs better than the existing approaches to which it is compared.

1 citations

Proceedings Article
01 Jan 2010
TL;DR: The current paper is mainly focused on testing the suitability of PNEPs to shallow parsing, which is to analyze the main components of the sentences rather than complete sentences.
Abstract: PNEPs (Parsing Networks of Evolutionary Processors) extend NEPs with context free (instead of substituting) rules, leftmost derivation, bad terminals check and indexes to rebuild the derivation tree. It is possible to build a PNEP from any context free grammar without additional constraints, able to generate all the different derivations for ambiguous grammars with a temporal performance bound by the depth of the derivation tree. One of the main difficulties encountered by parsing techniques when building complete parsing trees for natural languages is the spatial and temporal performance of the analysis. Shallow parsing tries to overcome these difficulties. The goal of shallow parsing is to analyze the main components of the sentences (for example, noun groups, verb groups, etc.) rather than complete sentences. The current paper is mainly focused on testing the suitability of PNEPs to shallow parsing.

1 citations

Proceedings ArticleDOI
29 Oct 2007
TL;DR: By introducing the Kernel principle, SVMs can carry out the training in high-dimensional space with smaller computational cost independent of their dimensionality.
Abstract: To be able to represent the whole hierarchical phrase structure, 10 types of Chinese chunks are defined. The paper presents a method of Chinese shallow Paring based on Support Vector machines (SVMs). Conventional recognition techniques based on Machine Learning have difficulty in selecting useful features as well as finding appropriate combination of selected features. SVMs can automatically focus on useful features and robustly handle a large feature set to develop models that maximize their generalizability. On the other hand, it is well known that SVMs achieve high generalization of very high dimensional feature space. Furthermore, by introducing the Kernel principle, SVMs can carry out the training in high-dimensional space with smaller computational cost independent of their dimensionality. The experiments produced promising results.

1 citations


Network Information
Related Topics (5)
Machine translation
22.1K papers, 574.4K citations
81% related
Natural language
31.1K papers, 806.8K citations
79% related
Language model
17.5K papers, 545K citations
79% related
Parsing
21.5K papers, 545.4K citations
79% related
Query language
17.2K papers, 496.2K citations
74% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20217
202012
20196
20185
201711
201611