scispace - formally typeset
Search or ask a question
Topic

Shallow parsing

About: Shallow parsing is a research topic. Over the lifetime, 397 publications have been published within this topic receiving 10211 citations.


Papers
More filters
Book Chapter
01 Jan 2005
TL;DR: Inspired by the LinGO Redwoods approach, this work is constructing a multilevel treebanking tool that incorporates a deep parser and grammar for Norwegian and is tightly linking its treebank to grammar development so as to achieve a sound embedding in grammatical theory and yield more useful results for applications.
Abstract: Current trends in language technology require treebanks that do not stop at the level of constituent structure, but include deeper and richer levels of analysis, including appropriate meaning structures. Capturing sufficient detail at different levels of linguistic description is too complex a task to be practically achievable by manual annotation or shallow parsing; rather it requires sophisticated tools that help secure the consistency of parallel but different structures. In conventional treebanks, grammatical functions and semantic roles are often simply attached to the syntactic constituent structure. The Penn Proposition Bank [12, 20] is basically constructed by labeling verbs as predicates and assigning appropriate semantic (thematic) roles to syntactic constituents that are in grammatical relations to the verbs. Though useful in its own right, this approach is nevertheless limited to verbs and is constrained by implicit isomorphism between the syntactic and semantic structures. In contrast, we are constructing a multilevel treebanking tool that incorporates a deep parser and grammar for Norwegian. Inspired by the LinGO Redwoods approach [19], we are tightly linking our treebank to grammar development so as to achieve a sound embedding in grammatical theory and yield more useful results for applications.

15 citations

Journal Article
TL;DR: Experimental results show that this approach can analyze a wide range of questions with high accuracy and produce reasonable textual responses and advantages of a novel Natural Language Interface comprising of shallow parsing based algorithms in conjunction with some intelligent techniques to train the system.
Abstract: This paper deals with a natural language interface, which accepts natural language questions as inputs and generates textual responses. In natural language processing, key-word matching based paradigm generate answers, however these answers frequently affected by certain language dependant phenomena such as semantic symmetry and ambiguous modification. Available techniques, described in the literature, deal with these problems using in depth parsing. In this paper, we will present rules to tackle linguistic phenomena using shallow parsing and discuss advantages of a novel Natural Language Interface comprising of shallow parsing based algorithms in conjunction with some intelligent techniques to train the system. Experimental results show that this approach can analyze a wide range of questions with high accuracy and produce reasonable textual responses.

15 citations

Proceedings Article
01 Jul 2006
TL;DR: A system that automatically constructs ontologies by extracting knowledge from dictionary definition sentences using Robust Minimal Recursion Semantics (RMRS) is outlined and how this system was designed to handle multiple lexicons and languages is discussed.
Abstract: In this paper, we outline the development of a system that automatically constructs ontologies by extracting knowledge from dictionary definition sentences using Robust Minimal Recursion Semantics (RMRS). Combining deep and shallow parsing resource through the common formalism of RMRS allows us to extract ontological relations in greater quantity and quality than possible with any of the methods independently. Using this method, we construct ontologies from two different Japanese lexicons and one English lexicon. We then link them to existing, handcrafted ontologies, aligning them at the word-sense level. This alignment provides a representative evaluation of the quality of the relations being extracted. We present the results of this ontology construction and discuss how our system was designed to handle multiple lexicons and languages.

14 citations

Proceedings Article
01 Jan 2018
TL;DR: This paper presented a study of various models -Nave Bayes, Random Forest Classifier, Conditional Random Field (CRF), and Hidden Markov Model (HMM) for language identification in English -Telugu Code Mixed Data.
Abstract: In a multilingual or sociolingual configuration Intra-sentential Code Switching (ICS) or Code Mixing (CM) is frequently observed nowadays. In the world, most of the people know more than one language. CM usage is especially apparent in social media platforms. Moreover, ICS is particularly significant in the context of technology, health, and law where conveying the upcoming developments are difficult in one's native language. In applications like dialog systems, machine translation, semantic parsing, shallow parsing, etc. CM and Code Switching pose serious challenges. To do any further advancement in code-mixed data, the necessary step is Language Identification. In this paper, we present a study of various models - Nave Bayes Classifier, Random Forest Classifier, Conditional Random Field (CRF), and Hidden Markov Model (HMM) for Language Identification in English - Telugu Code Mixed Data. Considering the paucity of resources in code mixed languages, we proposed the CRF model and HMM model for word level language identification. Our best performing system is CRF-based with an f1-score of 0.91.

14 citations

Proceedings Article
01 Jan 2005
TL;DR: The G-DEE system represents an extension of current visual interfaces for guidelines encoding, in that it supports automatic text processing functions which identify linguistic markers of document structure, such as recommendations, thereby decreasing the complexity of operations required by the user.
Abstract: In this paper, we present the G-DEE system, a document engineering environment aimed at clinical guidelines. This system represents an extension of current visual interfaces for guidelines encoding, in that it supports automatic text processing functions which identify linguistic markers of document structure, such as recommendations, thereby decreasing the complexity of operations required by the user. Such markers are identified by shallow parsing of free text and are automatically marked up as an early step of document structuring. From this first representation, it is possible to identify elements of guidelines contents, such as decision variables, and produce elements of GEM encoding, using rules defined as XSL style sheets. We tested our automatic structuring system on a set of sentences extracted from French clinical guidelines. As a result, 97% of the occurrences of deontic operators and their scopes were correctly marked up. G-DEE can be used for various purposes, from research into guidelines structure to assisting the encoding of guidelines into a GEM format or into decision rules.

14 citations


Network Information
Related Topics (5)
Machine translation
22.1K papers, 574.4K citations
81% related
Natural language
31.1K papers, 806.8K citations
79% related
Language model
17.5K papers, 545K citations
79% related
Parsing
21.5K papers, 545.4K citations
79% related
Query language
17.2K papers, 496.2K citations
74% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20217
202012
20196
20185
201711
201611