scispace - formally typeset
Search or ask a question

Showing papers on "Shallow parsing published in 2017"


Journal ArticleDOI
TL;DR: The authors argue that L2 speakers are more susceptible to retrieval interference when successful comprehension requires access to information from memory, and they claim that a primary source of L1/L2 differences lies in the ability to retrieve information constructed during sentence processing from memory.
Abstract: A growing body of research has investigated bilingual sentence processing. How to account for differences in native (L1) and non-native (L2) processing is controversial. Some explain L1/L2 differences in terms of different parsing mechanisms, and the hypothesis that L2 learners adopt ‘shallow’ parsing has received considerable attention. Others assume L1/L2 processing is similar, and explain L1/L2 differences in terms of capacity-based limitations being exceeded during L2 processing. More generally, the role that working memory plays in language acquisition and processing has garnered increasing interest. Based on research investigating L2 sentence processing, I claim that a primary source of L1/L2 differences lies in the ability to retrieve information constructed during sentence processing from memory. In contrast to describing L1/L2 differences in terms of shallow parsing or capacity limitations, I argue that L2 speakers are more susceptible to retrieval interference when successful comprehension requires access to information from memory.

129 citations


Posted Content
TL;DR: This article proposed a neural sequence chunking model that treats each chunk as a complete unit for labeling and achieved state-of-the-art performance on both text chunking and slot filling tasks.
Abstract: Many natural language understanding (NLU) tasks, such as shallow parsing (i.e., text chunking) and semantic slot filling, require the assignment of representative labels to the meaningful chunks in a sentence. Most of the current deep neural network (DNN) based methods consider these tasks as a sequence labeling problem, in which a word, rather than a chunk, is treated as the basic unit for labeling. These chunks are then inferred by the standard IOB (Inside-Outside-Beginning) labels. In this paper, we propose an alternative approach by investigating the use of DNN for sequence chunking, and propose three neural models so that each chunk can be treated as a complete unit for labeling. Experimental results show that the proposed neural sequence chunking models can achieve start-of-the-art performance on both the text chunking and slot filling tasks.

57 citations


Proceedings Article
12 Feb 2017
TL;DR: This paper investigates the use of DNN for sequence chunking, and proposes three neural models so that each chunk can be treated as a complete unit for labeling, which can achieve start-of-the-art performance on both the text chunking and slot filling tasks.
Abstract: Many natural language understanding (NLU) tasks, such as shallow parsing (i.e., text chunking) and semantic slot filling, require the assignment of representative labels to the meaningful chunks in a sentence. Most of the current deep neural network (DNN) based methods consider these tasks as a sequence labeling problem, in which a word, rather than a chunk, is treated as the basic unit for labeling. These chunks are then inferred by the standard IOB (Inside-Outside- Beginning) labels. In this paper, we propose an alternative approach by investigating the use of DNN for sequence chunking, and propose three neural models so that each chunk can be treated as a complete unit for labeling. Experimental results show that the proposed neural sequence chunking models can achieve start-of-the-art performance on both the text chunking and slot filling tasks.

50 citations


Proceedings ArticleDOI
01 Apr 2017
TL;DR: This system is considered as a good trial of the interaction between rule-based approach and statistical approach, where the rules can help the statistics in detecting the right diacritization and vice versa.
Abstract: This paper sheds light on a system that would be able to diacritize Arabic texts automatically (SHAKKIL). In this system, the diacritization problem will be handled through two levels; morphological and syntactic processing levels. The adopted morphological disambiguation algorithm depends on four layers; Uni-morphological form layer, rule-based morphological disambiguation layer, statistical-based disambiguation layer and Out Of Vocabulary (OOV) layer. The adopted syntactic disambiguation algorithms is concerned with detecting the case ending diacritics depending on a rule based approach simulating the shallow parsing technique. This will be achieved using an annotated corpus for extracting the Arabic linguistic rules, building the language models and testing the system output. This system is considered as a good trial of the interaction between rule-based approach and statistical approach, where the rules can help the statistics in detecting the right diacritization and vice versa. At this point, the morphological Word Error Rate (WER) is 4.56% while the morphological Diacritic Error Rate (DER) is 1.88% and the syntactic WER is 9.36%. The best WER is 14.78% compared to the best-published results, of (Abandah, 2015); 11.68%, (Rashwan, et al., 2015); 12.90% and (Metwally, Rashwan, & Atiya, 2016); 13.70%.

43 citations


Proceedings ArticleDOI
01 Oct 2017
TL;DR: In this study, shallow parsing is applied on Turkish sentences used to train and test the per-formances of various learning algorithms with various features specified for shallow parsing in Turkish.
Abstract: In this study, shallow parsing is applied on Turkish sentences. These sentences are used to train and test the per-formances of various learning algorithms with various features specified for shallow parsing in Turkish.

5 citations


Proceedings ArticleDOI
01 May 2017
TL;DR: An algorithm (by incorporating different modules of language models like synonym replacement, root word extraction and shallow parsing) which when applied upon the translation of English to Hindi text gives better evaluation results as compared to those algorithms which do not incorporate all these modules.
Abstract: Machine Translation, sometimes referred by the acronym MT, is one of the important fields of study of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. At its basic level, MT performs simple substitution of atomic words in one natural language for words in another language. Around the world, numerous systems are available in the market for the assessment of the translation being done by the various translation systems. Even within India, a large number of such evaluation systems are available and a lot of research is still going on to develop a better evaluation system which can beat the results produced by Human Evaluators. Even the main challenge before Indian Researchers is that the evaluation systems which are giving unbeatable results for the translation of Foreign languages (such as German, French, Chinese, etc.) are not even giving considerable results for the translation of Indian Languages (Hindi, Tamil, Telugu, Punjabi, etc.). So at par these evaluation systems cannot be applied as it is to evaluate Machine Translations of Indian Languages. Indian languages require a novel approach because of the relatively unrestricted order of words within a word group. In this paper, we are presenting an algorithm (by incorporating different modules of language models like synonym replacement, root word extraction and shallow parsing) which when applied upon the translation of English to Hindi text gives better evaluation results as compared to those algorithms which do not incorporate all these modules. Moreover, our study is limited to English to Hindi language pair and the testing is being with the corpora of agriculture domain.

1 citations


Patent
25 Jan 2017
TL;DR: In this article, a method for extracting product feature information from online shopping user comments was proposed, which comprises the following steps: 1) performing shallow parsing on user comments and recognizing a plurality of blocks from the user comments; 2) performing blocking analysis on the blocks; 3) extracting nominal information; 4) searching a frequent item set; 5) filtering non-product feature from the frequent items set.
Abstract: The invention relates to a method for extracting product feature information from online shopping user comments The method comprises the following steps: 1) performing shallow parsing on the user comments and recognizing a plurality of blocks from the user comments; 2) performing blocking analysis on the blocks; 3) extracting nominal information; 4) searching a frequent item set; 5) filtering non-product feature from the frequent item set According to the method for extracting product feature information from online shopping user comments provided by the invention, on the basis of fully considering that the noun block may be the product feature, the blocking analysis is performed on the basis of CRF shallow parsing; FP-growth algorithm is adopted for increasing the efficiency; a TF-IDF and TextRank combined filtering method is adopted for filtering with high accuracy; the method is suitable for analyzing the user comment texts in different fields; the general applicability is high; the efficiency is high; the method can meet the practical application requirement

1 citations


17 Nov 2017
TL;DR: The purpose of this thesis work is to propose an automated approach in detection and resolution of syntactic ambiguity, namely analytical, coordination and PP attachment types, using AmbiGO, the name of the prototyping web application developed for this thesis, which is freely available on the web.
Abstract: Technical documents are mostly written in natural languages and they are highly ambiguity-prone due to the fact that ambiguity is an inevitable feature of natural languages. Many researchers have urged technical documents to be free from ambiguity to avoid unwanted and, in some cases, disastrous consequences ambiguity and misunderstanding can have in technical context. Therefore the need for ambiguity detection tools to assist writers with ambiguity detection and resolution seems indispensable. The purpose of this thesis work is to propose an automated approach in detection and resolution of syntactic ambiguity. AmbiGO is the name of the prototyping web application that has been developed for this thesis which is freely available on the web. The hope is that a developed version of AmbiGO will assist users with ambiguity detection and resolution. Currently AmbiGO is capable of detecting and resolving three types of syntactic ambiguity, namely analytical, coordination and PP attachment types. AmbiGO uses syntactic parsing to detect ambiguity patterns and retrieves frequency counts from Google for each possible reading as a segregate for semantic analysis. Such semantic analysis through Google frequency counts has significantly improved the precision score of the tool’s output in all three ambiguity detection functions. AmbiGO is available at this URL: http://omidemon.pythonanywhere.com/

1 citations


Proceedings ArticleDOI
01 Feb 2017
TL;DR: The paper considers a problem of automatic processing of natural language Chinese texts and proposes to identify a sentence model by function words while limitedness of the dictionary could be compensated by an automatic building of a subject area thesaurus and a dictionary of common words using statistical processing of a document corpus.
Abstract: The paper considers a problem of automatic processing of natural language Chinese texts. One of the pressing tasks in this area is automatic fact acquisition from text documents by a query because existing automatic translators are useless at this task. The goal of the work is direct extraction of facts from the text in the original language without its translation. The suggested approach consists of syntactic analysis of sentences with subsequent matching of parts of speech found with a formalized query in the form of subject-object-predicate. A distinctive feature of the proposed algorithm of syntactic analysis is the absence of phase of segmentation into words for the sequence of hieroglyphs that make up the sentences. The bottleneck at this task is a dictionary because the correct interpretation of a phrase can be impossible when a word is absent in the dictionary. To eliminate this problem, we propose to identify a sentence model by function words while limitedness of the dictionary could be compensated by an automatic building of a subject area thesaurus and a dictionary of common words using statistical processing of a document corpus. The suggested approach was approved on a small topic area with a limited dictionary where it demonstrates its robustness. The analysis of temporal characteristics of the developed algorithm was carried out as well. As the proposed algorithm uses a naive inference, the parsing speed at real tasks could be unacceptable low, and this should become a subject for further research.

1 citations


Book ChapterDOI
20 Sep 2017
TL;DR: The paper gives a technical description of CoSyCo, a corpus of syntactic co-occurrences, which provides information on syntactically connected words in the Russian language.
Abstract: The paper gives a technical description of CoSyCo, a corpus of syntactic co-occurrences, which provides information on syntactically connected words in the Russian language. The paper includes an overview of the corpora collected for CoSyCo creation and the amount of collected combinations. In the paper, we also provide a short evaluation of the gathered information.

Journal ArticleDOI
TL;DR: This paper developed innovative algorithms such as shallow parsing and modified Lesk’s algorithm to resolve the issues in Word Sense Disambiguation and performed correct translation from Hindi language to English language and shows the comparison result with the website Google Translator.
Abstract: This paper developed innovative algorithms such as shallow parsing and modified Lesk’s algorithm to resolve the issues in Word Sense Disambiguation and performed correct translation from Hindi language to English language. Shallow parsing method is based on Hidden Markov model. We also perform an evaluation for 1657 Hindi tokens with 990 phrases for Parts of speech tagging and Chunking for given Hindi sentence as input and able to achieve Precision, Recall, F-score, Accuracy for Parts of speech tagger: Accuracy: 92.09%; precision: 84.76%; recall: 89.29%; F-score: 86.97, system accuracy for Chunk: Accuracy: 93.96%; precision: 89.33%; recall: 91.31%; F-score: 90.315%. The evaluation is performed by developing confusion matrix in which the system result of Parts of speech tagger and Chunk is compared with Gold standard date provided by IIIT Hyderabad in the summer school 2015. In this paper we discuss the second problem Word Sense Disambiguation in which we enhance the Modified Lesk algorithms by using overlap based method which will find information between three pieces of words in a given context. The system generated result resolves the issues of Word Sense Disambiguation and shows the comparison result with the website Google Translator in which we input polysemy word in a Hindi sentence and same sentence input in our generated system and shows the comparison result of both. The output result shows that our system resolves Word Sense Disambiguation and produces correct translation and Google Translator is fails to resolves the correct Translation.