scispace - formally typeset
Search or ask a question

Showing papers on "Shallow parsing published in 2018"


Proceedings Article
01 Jan 2018
TL;DR: This paper presented a study of various models -Nave Bayes, Random Forest Classifier, Conditional Random Field (CRF), and Hidden Markov Model (HMM) for language identification in English -Telugu Code Mixed Data.
Abstract: In a multilingual or sociolingual configuration Intra-sentential Code Switching (ICS) or Code Mixing (CM) is frequently observed nowadays. In the world, most of the people know more than one language. CM usage is especially apparent in social media platforms. Moreover, ICS is particularly significant in the context of technology, health, and law where conveying the upcoming developments are difficult in one's native language. In applications like dialog systems, machine translation, semantic parsing, shallow parsing, etc. CM and Code Switching pose serious challenges. To do any further advancement in code-mixed data, the necessary step is Language Identification. In this paper, we present a study of various models - Nave Bayes Classifier, Random Forest Classifier, Conditional Random Field (CRF), and Hidden Markov Model (HMM) for Language Identification in English - Telugu Code Mixed Data. Considering the paucity of resources in code mixed languages, we proposed the CRF model and HMM model for word level language identification. Our best performing system is CRF-based with an f1-score of 0.91.

14 citations


Posted Content
TL;DR: This paper presents a statistical POS tagger for Kannada using different machine learning and neural network models, and explores the use of character and word embeddings together forKannada POS Tagging.
Abstract: POS Tagging serves as a preliminary task for many NLP applications. Kannada is a relatively poor Indian language with very limited number of quality NLP tools available for use. An accurate and reliable POS Tagger is essential for many NLP tasks like shallow parsing, dependency parsing, sentiment analysis, named entity recognition. We present a statistical POS tagger for Kannada using different machine learning and neural network models. Our Kannada POS tagger outperforms the state-of-the-art Kannada POS tagger by 6%. Our contribution in this paper is three folds - building a generic POS Tagger, comparing the performances of different modeling techniques, exploring the use of character and word embeddings together for Kannada POS Tagging.

12 citations


Journal ArticleDOI
TL;DR: An architecture-MwTExt is presented, for automatic extraction of multi-word terms-MWTs from such expressions within un-annotated English documents, with average precision of 97%.
Abstract: Multiword expressions are omnipresent element of natural language, whose construal as a linguistic resource has significant importance in various applications. This paper presents an architecture-MwTExt, for automatic extraction of multi-word terms-MWTs from such expressions within un-annotated English documents. Natural Language Processing techniques such as Shallow parsing and syntactic structure analysis are used to extract MWTs, with specific focus on lexical patterns as (Noun Preposition Noun), (Noun Preposition Noun + Noun) and (Noun Preposition Noun Preposition Noun). The MWTs extracted can be further used to form compound concepts within Ontology. The lexical descriptions of MWTs are encoded in Web Ontology Language OWL/XML. MwTExt has been tested on Computer Science domain texts, and the results obtained are compared with those obtained by Text2Onto, an Ontology learning tool and term extractors such as TermRaider and TerMine. The result signifies that MwTExt performs better for extraction of accurate lexicalized MWTs with average precision of 97%.

5 citations


Patent
05 Jan 2018
TL;DR: In this paper, a natural language parsing method and a NLP system is presented, which comprises the following steps of performing word segmentation on an input text sentence, and extracting words; performing part-of-speech tagging on each word to acquire the part of speech of each word; counting frequency of forming dependence relationship between each two words, counting thefrequency of forming the dependency relationship between the parts of speech between the words, and counting the frequency of form the dependence relationship of each two sentences, and outputting the minimal spanning tree in a formatted manner.
Abstract: The invention provides a natural language parsing method and a natural language parsing system. The method comprises the following steps of performing word segmentation on an input text sentence, andextracting words; performing part-of-speech tagging on each word to acquire the part of speech of each word; counting frequency of forming dependence relationship between each two words, counting thefrequency of forming the dependence relationship between the part of speech of each word with the part of speech of other word, and counting the frequency of forming the dependence relationship between parts of speech of each two words; generating a dependence parsing side between the words in the input text sentence, and generating a directed tree by using a maximum weight as a unique side; computing a minimum spanning tree by using a Prim minimal spanning tree algorithm in the directed tree; and outputting the minimal spanning tree in a formatted manner. According to the method and the system, a shallow parsing mode is introduced to acquire each word and the part of speech thereof of the input text. The algorithm is concise, data processing speed is quick; the method and the system can be used as key technologies of long sentence parsing for deep research.

1 citations


Proceedings ArticleDOI
01 May 2018
TL;DR: This paper proposes a SVM and template-based approach to Tibetan person knowledge extraction, and designs a hierarchical SVM classifier to realize the entity knowledge extraction.
Abstract: Entity knowledge extraction is the foundation of the Tibetan knowledge graph construction, which provides support for Tibetan question answering system, information retrieval, information extraction and other researches, and promotes national unity and social stability. This paper proposes a SVM and template-based approach to Tibetan person knowledge extraction. Through constructing the training corpus, we build the templates based the shallow parsing analysis of Tibetan syntactic, semantic features and verbs. Using the training corpus, we design a hierarchical SVM classifier to realize the entity knowledge extraction. Finally, experimental results prove the method has greater improvement in Tibetan person knowledge extraction.