scispace - formally typeset
Search or ask a question

Showing papers on "Chunking (computing) published in 2011"


Journal ArticleDOI
TL;DR: A unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recursion, and much more is proposed.
Abstract: We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity rec...

224 citations


Journal ArticleDOI
TL;DR: This study is the first to compare the performance of the whole chunking pipeline, and to combine different existing chunking systems, and OpenNLP scored best both in performance and usability.

52 citations


Journal Article
TL;DR: A new Chinese chunking method is proposed based on conditional random fields and semantic classes that could incorporate various types of features to overcome the label bias problem and achieves impressive accuracy.
Abstract: To improve the accuracy of Chinese chunking and utilize the semantic information of words,a new Chinese chunking method is proposed based on conditional random fields and semantic classesThrough the analysis of Chinese chunking task and its sequential characteristics,conditional random fields that could incorporate various types of features were applied to overcome the label bias problemSemantic features were utilized to improve the chunking performanceExperimental results show that the algorithm achieves impressive accuracy of 9277% in terms of the F-scoreA further experiment indicates the effects of feature template selection and training data′s scales on the aspect of chunking performance

17 citations


01 Jan 2011
TL;DR: A tool that integrates the National Library of Medicine's MetaMap software with GATE, an open-source text an- alytics framework, to chunk cardiovascular disease guideline text into default segments, XML element content, sentences and phrases, which were sequentially submitted to MetaMap for annotation.
Abstract: We developed a tool that integrates the National Library of Medicine's MetaMap software with GATE, an open-source text an- alytics framework. The tool allows non-ASCII encoded documents of numerous formats to be annotated with UMLS concepts. We created a GATE pipeline to chunk cardiovascular disease guideline text into default segments (blank-line delimited), XML element content, sentences and phrases, which were sequentially submitted to MetaMap for annotation. XML element, sentence and phrase chunking allowed term extraction and mapping to be completed in around 1/3 of the time taken with de- fault chunking, although with slight loss of accuracy (F1.0s=0.94-0.99). However, phrase chunking allows more complex input to be processed in real time, which is not possible with the other approaches. We discuss the results in relation to use of MetaMap's --term processing option for generating pre- and post-coordinated mappings from composite phrases.

14 citations


Proceedings ArticleDOI
Zhi Tang1, Youjip Won1
21 Jun 2011
TL;DR: This work promotes a multithread file chunking prototype system, which is able to exploit the hardware organization of the CPU-GPGPU heterogeneous computer and determine which device should be used to chunk the file to accelerate the content based file chunksing operation of deduplication.
Abstract: the fast development of Graphics Processing Unit (GPU) leads to the popularity of General-purpose usage of GPU (GPGPU). So far, most modern computers are CPU-GPGPU heterogeneous architecture and CPU is used as host processor. In this work, we promote a multithread file chunking prototype system, which is able to exploit the hardware organization of the CPU-GPGPU heterogeneous computer and determine which device should be used to chunk the file to accelerate the content based file chunking operation of deduplication. We built rules for the system to choose which device should be used to chunk file and also found the optimal choice of other related parameters of both CPU and GPGPU subsystem like segment size and block dimension. This prototype was implemented and tested. The result of using GTX460(336 cores) and Intel i5 (four cores) shows that this system can increase the chunking speed 63% compared to using GPGPU alone and 80% compared to using CPU alone.

13 citations


01 Jan 2011
TL;DR: The problem of shallow parsing of Polish, most specifically — chunking is discussed and some theoretical issues related to chunking of Polish texts are discussed and chunk annotation guidelines are proposed.
Abstract: This paper discusses the problem of shallow parsing of Polish, most specifically — chunking. We discuss some theoretical issues related to chunking of Polish texts and propose our chunk annotation guidelines. In the second part of the paper we present initial results of using Machine Learning algorithms to train a working chunker for the proposed chunk types.

9 citations


Journal ArticleDOI
TL;DR: Overall, NLP technologies such as chunking can bring performance, alternative methodologies and solutions at times where the highest academic approaches are not enough.
Abstract: Neuro-Linguistic Programming (NLP) can bring new perspectives and new results to any endeavour involving personal (i.e. internal) and interpersonal communication. The organisation of information to achieve results is at the core of NLP and also a frequent goal for interpersonal conflict managers such as arbiters, mediators and negotiators. This article sheds light on one particular NLP tool, namely chunking. Chunking is a direct application of the NLP Meta-model, a communications model used to find and challenge linguistic distortions in the client's language. Chunking deals with information size and direction. Information can be chunked up or down in size and can be moved laterally to find alternative examples of a concept at the same level of information. In a conflict resolution or mediation setting, chunking up can be a guide to reach an initial agreement level, a compromise between the parties. Chunking down, on the other hand can be used to deal with specific problems and find a leverage point to make a breakthrough. Overall, NLP technologies such as chunking can bring performance, alternative methodologies and solutions at times where the highest academic approaches are not enough.

4 citations


Proceedings ArticleDOI
12 Dec 2011
TL;DR: A chunking method for Malayalam sentences based on morpheme based augmented transition network that works with good accuracy with the set of chunk rules proposed and has good potential for use as a full fledged parser forMalayalam language.
Abstract: Various methods have been proposed for chunking sentences in agglutinative languages. For Malayalam a South Indian language, chunking methods proposed are mainly statistical. This paper describes a chunking method for Malayalam sentences based on morpheme based augmented transition network. For the trial set of sentences the system works with good accuracy with the set of chunk rules proposed. The chunking system has good potential for use as a full fledged parser for Malayalam language.

4 citations


Proceedings ArticleDOI
12 Dec 2011
TL;DR: A chunking method for Malayalam sentences based on morpheme based augmented transition network that works with good accuracy with the set of chunk rules proposed and has good potential for use as a full fledged parser forMalayalam language.
Abstract: Various methods have been proposed for chunking sentences in agglutinative languages. For Malayalam a South Indian language, chunking methods proposed are mainly statistical. This paper describes a chunking method for Malayalam sentences based on morpheme based augmented transition network. For the trial set of sentences the system works with good accuracy with the set of chunk rules proposed. The chunking system has good potential for use as a full fledged parser for Malayalam language.

4 citations




08 Jul 2011
TL;DR: The quality of translation of chunks obtained by markerbased chunking in English and French in both directions is inspected and it is shown that more than three quarters of the chunks can be translated by the one-step analogy-based translation method.
Abstract: An example-based machine translation (EBMT) system 16) based on analogies requires numerous analogies between linguistic units to work properly. Consequently, long sentences cannot be handled directly in such a framework. In this paper, we inspect the quality of translation of chunks obtained by markerbased chunking in English and French in both directions. Our results show that more than three quarters of the chunks can be translated by the one-step analogy-based translation method, and that a little bit less than half of the chunks get a perfect translation when compared with references.

Dissertation
01 Aug 2011

12 Apr 2011
TL;DR: This project implements EG for a Natural Language Processing structured prediction task of phrasal chunking and compares the performance of EG with other discriminative learning algorithms that have state of the art results on this task.
Abstract: Exponentiated Gradient (EG) updates were originally introduced in (Kivinen and Warmuth, 1997) in the context of online learning algorithms. EG updates were shown by (Collins et al., 2008) to provide fast batch and online algorithms for learning a max-margin classifier. They show that EG can converge quickly due to multiplicative updates, and that EG updates can be factored into tractable components for structured prediction tasks where the number of output labels is exponential in the size of the input. In this project, we implement EG for a Natural Language Processing structured prediction task of phrasal chunking (finding noun phrases, and other phrases in text) and we compare the performance of EG with other discriminative learning algorithms that have state of the art results on this task.

Proceedings ArticleDOI
01 Dec 2011
TL;DR: A parallel programming framework is demonstrated applicable to different algorithms in a distinctive way from the conventional single algorithm speedup at a particular point of time, which fosters application dependent speedup over uniprocessor applications for a given workload and even on small Ethernet/IP based networks.
Abstract: In this multi core era, there is a huge influx of symmetric multi-process computers based on shared memory architecture and high end server platforms. It appears no adequate framework exists to manifest the complete potential of the hardware. In this paper a parallel programming framework is demonstrated applicable to different algorithms in a distinctive way from the conventional single algorithm speedup at a particular point of time. The framework fosters application dependent speedup over uniprocessor applications for a given workload and even on small Ethernet/IP based networks. Functional programming paradigm has the ability to implicitly parallelize program to multicore computers and scaled in distributed networks using a message queues. Also the map reduction framework is based on functional programming paradigm, where the programs can be written in summation form, specifying a map function which generates intermediate key value pairs and a reduce function merging the key value pairs. With this method a substantial increase in speed efficiency is obtained. However, the framework by itself will not substantially increase the speed of execution, as other parameters like chunking of data affect the performance metrics. Graphical methods are used and explained in order to show the optimum amount of chunking to be used for execution.