Showing papers on "Chunking (computing) published in 2011"

PDF

Open Access

Journal Article•DOI•

Natural Language Processing (Almost) from Scratch

[...]

CollobertRonan, WestonJason, BottouLéon, KarlenMichael, KavukcuogluKoray, KuksaPavel - Show less +2 more

01 Nov 2011-Journal of Machine Learning Research

TL;DR: A unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recursion, and much more is proposed.

...read moreread less

Abstract: We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity rec...

...read moreread less

224 citations

Journal Article•DOI•

Comparing and combining chunkers of biomedical text

[...]

Ning Kang¹, Erik M. van Mulligen¹, Jan A. Kors¹•Institutions (1)

Erasmus University Medical Center¹

01 Apr 2011-Journal of Biomedical Informatics

TL;DR: This study is the first to compare the performance of the whole chunking pipeline, and to combine different existing chunking systems, and OpenNLP scored best both in performance and usability.

...read moreread less

52 citations

Journal Article•

Chinese chunking method based on conditional random fields and semantic classes

[...]

Xue Yi-bo¹•Institutions (1)

Harbin University of Science and Technology¹

01 Jan 2011-Journal of the Harbin Institute of Technology

TL;DR: A new Chinese chunking method is proposed based on conditional random fields and semantic classes that could incorporate various types of features to overcome the label bias problem and achieves impressive accuracy.

...read moreread less

Abstract: To improve the accuracy of Chinese chunking and utilize the semantic information of words,a new Chinese chunking method is proposed based on conditional random fields and semantic classesThrough the analysis of Chinese chunking task and its sequential characteristics,conditional random fields that could incorporate various types of features were applied to overcome the label bias problemSemantic features were utilized to improve the chunking performanceExperimental results show that the algorithm achieves impressive accuracy of 9277% in terms of the F-scoreA further experiment indicates the effects of feature template selection and training data′s scales on the aspect of chunking performance

...read moreread less

17 citations

A tool for enhancing MetaMap performance when annotating clinical guideline documents with UMLS concepts

[...]

Phil Gooch, Abdul V. Roudsari

01 Jan 2011

TL;DR: A tool that integrates the National Library of Medicine's MetaMap software with GATE, an open-source text an- alytics framework, to chunk cardiovascular disease guideline text into default segments, XML element content, sentences and phrases, which were sequentially submitted to MetaMap for annotation.

...read moreread less

Abstract: We developed a tool that integrates the National Library of Medicine's MetaMap software with GATE, an open-source text an- alytics framework. The tool allows non-ASCII encoded documents of numerous formats to be annotated with UMLS concepts. We created a GATE pipeline to chunk cardiovascular disease guideline text into default segments (blank-line delimited), XML element content, sentences and phrases, which were sequentially submitted to MetaMap for annotation. XML element, sentence and phrase chunking allowed term extraction and mapping to be completed in around 1/3 of the time taken with de- fault chunking, although with slight loss of accuracy (F1.0s=0.94-0.99). However, phrase chunking allows more complex input to be processed in real time, which is not possible with the other approaches. We discuss the results in relation to use of MetaMap's --term processing option for generating pre- and post-coordinated mappings from composite phrases.

...read moreread less

14 citations

Proceedings Article•DOI•

Multithread Content Based File Chunking System in CPU-GPGPU Heterogeneous Architecture

[...]

Zhi Tang¹, Youjip Won¹•Institutions (1)

Hanyang University¹

21 Jun 2011

TL;DR: This work promotes a multithread file chunking prototype system, which is able to exploit the hardware organization of the CPU-GPGPU heterogeneous computer and determine which device should be used to chunk the file to accelerate the content based file chunksing operation of deduplication.

...read moreread less

Abstract: the fast development of Graphics Processing Unit (GPU) leads to the popularity of General-purpose usage of GPU (GPGPU). So far, most modern computers are CPU-GPGPU heterogeneous architecture and CPU is used as host processor. In this work, we promote a multithread file chunking prototype system, which is able to exploit the hardware organization of the CPU-GPGPU heterogeneous computer and determine which device should be used to chunk the file to accelerate the content based file chunking operation of deduplication. We built rules for the system to choose which device should be used to chunk file and also found the optimal choice of other related parameters of both CPU and GPGPU subsystem like segment size and block dimension. This prototype was implemented and tested. The result of using GTX460(336 cores) and Intel i5 (four cores) shows that this system can increase the chunking speed 63% compared to using GPGPU alone and 80% compared to using CPU alone.

...read moreread less

13 citations

Chunking of Polish: guidelines, discussion and experiments with Machine Learning

[...]

Marek Maziarz, Adam Radziszewski, Jan Wieczorek

01 Jan 2011

TL;DR: The problem of shallow parsing of Polish, most specifically — chunking is discussed and some theoretical issues related to chunking of Polish texts are discussed and chunk annotation guidelines are proposed.

...read moreread less

Abstract: This paper discusses the problem of shallow parsing of Polish, most specifically — chunking. We discuss some theoretical issues related to chunking of Polish texts and propose our chunk annotation guidelines. In the second part of the paper we present initial results of using Machine Learning algorithms to train a working chunker for the proposed chunk types.

...read moreread less

9 citations

Journal Article•DOI•

The Neuro-Linguistic Programming Approach to Conflict Resolution, Negotiation and Change

[...]

Eduard Vinyamata

16 May 2011-Journal of Conflictology

TL;DR: Overall, NLP technologies such as chunking can bring performance, alternative methodologies and solutions at times where the highest academic approaches are not enough.

...read moreread less

Abstract: Neuro-Linguistic Programming (NLP) can bring new perspectives and new results to any endeavour involving personal (i.e. internal) and interpersonal communication. The organisation of information to achieve results is at the core of NLP and also a frequent goal for interpersonal conflict managers such as arbiters, mediators and negotiators. This article sheds light on one particular NLP tool, namely chunking. Chunking is a direct application of the NLP Meta-model, a communications model used to find and challenge linguistic distortions in the client's language. Chunking deals with information size and direction. Information can be chunked up or down in size and can be moved laterally to find alternative examples of a concept at the same level of information. In a conflict resolution or mediation setting, chunking up can be a guide to reach an initial agreement level, a compromise between the parties. Chunking down, on the other hand can be used to deal with specific problems and find a leverage point to make a breakthrough. Overall, NLP technologies such as chunking can bring performance, alternative methodologies and solutions at times where the highest academic approaches are not enough.

...read moreread less

4 citations

Proceedings Article•DOI•

Shallow parser for Malayalam language using finite state cascades

[...]

Latha R Nair¹, S. David Peter¹•Institutions (1)

Cochin University of Science and Technology¹

12 Dec 2011

TL;DR: A chunking method for Malayalam sentences based on morpheme based augmented transition network that works with good accuracy with the set of chunk rules proposed and has good potential for use as a full fledged parser forMalayalam language.

...read moreread less

Abstract: Various methods have been proposed for chunking sentences in agglutinative languages. For Malayalam a South Indian language, chunking methods proposed are mainly statistical. This paper describes a chunking method for Malayalam sentences based on morpheme based augmented transition network. For the trial set of sentences the system works with good accuracy with the set of chunk rules proposed. The chunking system has good potential for use as a full fledged parser for Malayalam language.

...read moreread less

4 citations

Proceedings Article•DOI•

Shallow parser for Malayalam language using finite state cascades

[...]

Latha R Nair¹, David Peter S¹•Institutions (1)

Cochin University of Science and Technology¹

12 Dec 2011

...read moreread less

4 citations

Posted Content•

E-Filing: Mastering the Tech-Rhetoric

[...]

Gerald Lebovits¹, Gerald Lebovits², Gerald Lebovits³•Institutions (3)

New York University¹, Columbia University², Fordham University³

01 May 2011-Social Science Research Network

1 citations

Marker-based Chunking for Analogy-based Translation of Chunks.

[...]

Kota Takeya, Yves Lepage

19 Sep 2011

Evaluation of Analogy-based Translation of Chunks obtained by Marker-based Chunking

[...]

Takeya Kota, Lepage Yves

08 Jul 2011

TL;DR: The quality of translation of chunks obtained by markerbased chunking in English and French in both directions is inspected and it is shown that more than three quarters of the chunks can be translated by the one-step analogy-based translation method.

...read moreread less

Abstract: An example-based machine translation (EBMT) system 16) based on analogies requires numerous analogies between linguistic units to work properly. Consequently, long sentences cannot be handled directly in such a framework. In this paper, we inspect the quality of translation of chunks obtained by markerbased chunking in English and French in both directions. Our results show that more than three quarters of the chunks can be translated by the one-step analogy-based translation method, and that a little bit less than half of the chunks get a perfect translation when compared with references.

...read moreread less

Dissertation•

Unsupervised partial parsing

[...]

Elias Ponvert

01 Aug 2011

Experiments on phrasal chunking in nlp using exponentiated gradient for structured prediction

[...]

Porus Jimmy Patell

12 Apr 2011

TL;DR: This project implements EG for a Natural Language Processing structured prediction task of phrasal chunking and compares the performance of EG with other discriminative learning algorithms that have state of the art results on this task.

...read moreread less

Abstract: Exponentiated Gradient (EG) updates were originally introduced in (Kivinen and Warmuth, 1997) in the context of online learning algorithms. EG updates were shown by (Collins et al., 2008) to provide fast batch and online algorithms for learning a max-margin classifier. They show that EG can converge quickly due to multiplicative updates, and that EG updates can be factored into tractable components for structured prediction tasks where the number of output labels is exponential in the size of the input. In this project, we implement EG for a Natural Language Processing structured prediction task of phrasal chunking (finding noun phrases, and other phrases in text) and we compare the performance of EG with other discriminative learning algorithms that have state of the art results on this task.

...read moreread less

Proceedings Article•DOI•

Map reduction framework for parallel data mining: Multicore to distributed network systems

[...]

Rahul Ramakrishna¹, M V Bhaskara Rao•Institutions (1)

Sir M. Visvesvaraya Institute of Technology¹

01 Dec 2011

TL;DR: A parallel programming framework is demonstrated applicable to different algorithms in a distinctive way from the conventional single algorithm speedup at a particular point of time, which fosters application dependent speedup over uniprocessor applications for a given workload and even on small Ethernet/IP based networks.

...read moreread less

Abstract: In this multi core era, there is a huge influx of symmetric multi-process computers based on shared memory architecture and high end server platforms. It appears no adequate framework exists to manifest the complete potential of the hardware. In this paper a parallel programming framework is demonstrated applicable to different algorithms in a distinctive way from the conventional single algorithm speedup at a particular point of time. The framework fosters application dependent speedup over uniprocessor applications for a given workload and even on small Ethernet/IP based networks. Functional programming paradigm has the ability to implicitly parallelize program to multicore computers and scaled in distributed networks using a message queues. Also the map reduction framework is based on functional programming paradigm, where the programs can be written in summation form, specifying a map function which generates intermediate key value pairs and a reduce function merging the key value pairs. With this method a substantial increase in speed efficiency is obtained. However, the framework by itself will not substantially increase the speed of execution, as other parameters like chunking of data affect the performance metrics. Graphical methods are used and explained in order to show the optimum amount of chunking to be used for execution.

...read moreread less