Showing papers on "Shallow parsing published in 2018"

PDF

Open Access

Proceedings Article•

Word Level Language Identification in English Telugu Code Mixed Data

[...]

Sunil Gundapu¹, Radhika Mamidi¹•Institutions (1)

International Institute of Information Technology, Hyderabad¹

01 Jan 2018

TL;DR: This paper presented a study of various models -Nave Bayes, Random Forest Classifier, Conditional Random Field (CRF), and Hidden Markov Model (HMM) for language identification in English -Telugu Code Mixed Data.

...read moreread less

Abstract: In a multilingual or sociolingual configuration Intra-sentential Code Switching (ICS) or Code Mixing (CM) is frequently observed nowadays. In the world, most of the people know more than one language. CM usage is especially apparent in social media platforms. Moreover, ICS is particularly significant in the context of technology, health, and law where conveying the upcoming developments are difficult in one's native language. In applications like dialog systems, machine translation, semantic parsing, shallow parsing, etc. CM and Code Switching pose serious challenges. To do any further advancement in code-mixed data, the necessary step is Language Identification. In this paper, we present a study of various models - Nave Bayes Classifier, Random Forest Classifier, Conditional Random Field (CRF), and Hidden Markov Model (HMM) for Language Identification in English - Telugu Code Mixed Data. Considering the paucity of resources in code mixed languages, we proposed the CRF model and HMM model for word level language identification. Our best performing system is CRF-based with an f1-score of 0.91.

...read moreread less

14 citations

Posted Content•

Building a Kannada POS Tagger Using Machine Learning and Neural Network Models.

[...]

Ketan Kumar Todi, Pruthwik Mishra, Dipti Misra Sharma

09 Aug 2018-arXiv: Computation and Language

TL;DR: This paper presents a statistical POS tagger for Kannada using different machine learning and neural network models, and explores the use of character and word embeddings together forKannada POS Tagging.

...read moreread less

Abstract: POS Tagging serves as a preliminary task for many NLP applications. Kannada is a relatively poor Indian language with very limited number of quality NLP tools available for use. An accurate and reliable POS Tagger is essential for many NLP tasks like shallow parsing, dependency parsing, sentiment analysis, named entity recognition. We present a statistical POS tagger for Kannada using different machine learning and neural network models. Our Kannada POS tagger outperforms the state-of-the-art Kannada POS tagger by 6%. Our contribution in this paper is three folds - building a generic POS Tagger, comparing the performances of different modeling techniques, exploring the use of character and word embeddings together for Kannada POS Tagging.

...read moreread less

12 citations

Journal Article•DOI•

MwTExt: automatic extraction of multi-word terms to generate compound concepts within ontology

[...]

Pratik Thanawala¹, Jyoti Pareek²•Institutions (2)

Ahmedabad University¹, Gujarat University²

21 Feb 2018-International Journal of Information Technology

TL;DR: An architecture-MwTExt is presented, for automatic extraction of multi-word terms-MWTs from such expressions within un-annotated English documents, with average precision of 97%.

...read moreread less

Abstract: Multiword expressions are omnipresent element of natural language, whose construal as a linguistic resource has significant importance in various applications. This paper presents an architecture-MwTExt, for automatic extraction of multi-word terms-MWTs from such expressions within un-annotated English documents. Natural Language Processing techniques such as Shallow parsing and syntactic structure analysis are used to extract MWTs, with specific focus on lexical patterns as (Noun Preposition Noun), (Noun Preposition Noun + Noun) and (Noun Preposition Noun Preposition Noun). The MWTs extracted can be further used to form compound concepts within Ontology. The lexical descriptions of MWTs are encoded in Web Ontology Language OWL/XML. MwTExt has been tested on Computer Science domain texts, and the results obtained are compared with those obtained by Text2Onto, an Ontology learning tool and term extractors such as TermRaider and TerMine. The result signifies that MwTExt performs better for extraction of accurate lexicalized MWTs with average precision of 97%.

...read moreread less

5 citations

Patent•

Natural language parsing method and system

[...]

Chen Hao

05 Jan 2018

TL;DR: In this paper, a natural language parsing method and a NLP system is presented, which comprises the following steps of performing word segmentation on an input text sentence, and extracting words; performing part-of-speech tagging on each word to acquire the part of speech of each word; counting frequency of forming dependence relationship between each two words, counting thefrequency of forming the dependency relationship between the parts of speech between the words, and counting the frequency of form the dependence relationship of each two sentences, and outputting the minimal spanning tree in a formatted manner.

...read moreread less

Abstract: The invention provides a natural language parsing method and a natural language parsing system. The method comprises the following steps of performing word segmentation on an input text sentence, andextracting words; performing part-of-speech tagging on each word to acquire the part of speech of each word; counting frequency of forming dependence relationship between each two words, counting thefrequency of forming the dependence relationship between the part of speech of each word with the part of speech of other word, and counting the frequency of forming the dependence relationship between parts of speech of each two words; generating a dependence parsing side between the words in the input text sentence, and generating a directed tree by using a maximum weight as a unique side; computing a minimum spanning tree by using a Prim minimal spanning tree algorithm in the directed tree; and outputting the minimal spanning tree in a formatted manner. According to the method and the system, a shallow parsing mode is introduced to acquire each word and the part of speech thereof of the input text. The algorithm is concise, data processing speed is quick; the method and the system can be used as key technologies of long sentence parsing for deep research.

...read moreread less

1 citations

Proceedings Article•DOI•

Template construction and Tibetan knowledge extraction

[...]

Yuan Sun¹, Zhen Zhu¹•Institutions (1)

Minzu University of China¹

01 May 2018

TL;DR: This paper proposes a SVM and template-based approach to Tibetan person knowledge extraction, and designs a hierarchical SVM classifier to realize the entity knowledge extraction.

...read moreread less

Abstract: Entity knowledge extraction is the foundation of the Tibetan knowledge graph construction, which provides support for Tibetan question answering system, information retrieval, information extraction and other researches, and promotes national unity and social stability. This paper proposes a SVM and template-based approach to Tibetan person knowledge extraction. Through constructing the training corpus, we build the templates based the shallow parsing analysis of Tibetan syntactic, semantic features and verbs. Using the training corpus, we design a hierarchical SVM classifier to realize the entity knowledge extraction. Finally, experimental results prove the method has greater improvement in Tibetan person knowledge extraction.

...read moreread less