Showing papers on "Chunking (computing) published in 2016"

PDF

Open Access

Journal Article•DOI•

What's in a name?: the multiple meanings of “chunk” and “chunking”

[...]

Fernand Gobet¹, Martyn Lloyd-Kelly¹, Peter C. R. Lane²•Institutions (2)

University of Liverpool¹, University of Hertfordshire²

09 Feb 2016-Frontiers in Psychology

TL;DR: This is an open-access article and the use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and the original publication in this journal is cited, in accordance with accepted academic practice.

...read moreread less

Abstract: © 2016 Gobet, Lloyd-Kelly and Lane. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

...read moreread less

36 citations

Journal Article•DOI•

Can chunking reduce syntactic complexity of natural languages

[...]

Qian Lu¹, Chunshan Xu², Chunshan Xu¹, Haitao Liu³, Haitao Liu¹ - Show less +1 more•Institutions (3)

Zhejiang University¹, Anhui Jianzhu University², Ningbo Institute of Technology, Zhejiang University³

01 Nov 2016-Complexity

TL;DR: In this article, the structural properties of language to minimize dependency distance were investigated and it was shown that chunking may significantly reduce mean dependency distance of linear sequences, which suggests that language may have evolved the mechanism of dynamic chunking to reduce the complexity for the sake of efficient communication.

...read moreread less

Abstract: Natural language is a complex adaptive system with multiple levels. The hierarchical structure may have much to do with the complexity of language. Dependency Distance has been invoked to explain various linguistic patterns regarding syntactic complexity. However, little attention has been paid to how the structural properties of language to minimize dependency distance. This article computationally simulates several chunked artificial languages, and shows, through comparison with Mandarin Chinese, that chunking may significantly reduce mean dependency distance of linear sequences. These results suggest that language may have evolved the mechanism of dynamic chunking to reduce the complexity for the sake of efficient communication. © 2016 Wiley Periodicals, Inc. Complexity 21: 33–41, 2016

...read moreread less

27 citations

Proceedings Article•

Joint word segmentation, POS-tagging and syntactic chunking

[...]

Chen Lyu¹, Yue Zhang², Donghong Ji¹•Institutions (2)

Wuhan University¹, Singapore University of Technology and Design²

12 Feb 2016

TL;DR: A joint model that performs segmentation, POS-tagging and chunking simultaneously, and employs a semi-supervised method to derive chunk cluster features from large-scale automatically-chunked data to address the sparsity of full chunk features.

...read moreread less

Abstract: Chinese chunking has traditionally been solved by assuming gold standard word segmentation. We find that the accuracies drop drastically when automatic segmentation is used. Inspired by the fact that chunking knowledge can potentially improve segmentation, we explore a joint model that performs segmentation, POS-tagging and chunking simultaneously. In addition, to address the sparsity of full chunk features, we employ a semi-supervised method to derive chunk cluster features from large-scale automatically-chunked data. Results show the effectiveness of the joint model with semi-supervised features.

...read moreread less

25 citations

Patent•

Storage cluster and method that efficiently store small objects with erasure codes

[...]

Adam Manzanares, Lluis Pamies-Juarez, Cyril Guyot, Koen De Keyser, Mark Christiaens, Robert Mateescu - Show less +2 more

09 Mar 2016

TL;DR: In this article, small objects are efficiently stored with erasure codes by combining a small object with other small objects and/or large objects to form a single large object for chunking, and providing early notification of permanent storage to the sources of the objects to prevent small objects from becoming stale while waiting for additional objects to be combined.

...read moreread less

Abstract: Small objects are efficiently stored with erasure codes by combining a small object with other small objects and/or large objects to form a single large object for chunking, and providing early notification of permanent storage to the sources of the objects to prevent small objects from becoming stale while waiting for additional objects to be combined

...read moreread less

23 citations

Journal Article•DOI•

Chunking improves symbolic sequence processing and relies on working memory gating mechanisms.

[...]

Oleg Solopchuk¹, Andrea Alamia¹, Etienne Olivier¹, Alexandre Zénon¹•Institutions (1)

Université catholique de Louvain¹

01 Mar 2016-Learning & Memory

TL;DR: It is found that participants who adopted a consistent chunking strategy during symbolic sequence learning showed a greater improvement of their performance and a larger decrease in cognitive workload over time, indicating that chunking is a cost-saving strategy that enhances effectiveness of symbolic sequenceLearning.

...read moreread less

Abstract: Chunking, namely the grouping of sequence elements in clusters, is ubiquitous during sequence processing, but its impact on performance remains debated. Here, we found that participants who adopted a consistent chunking strategy during symbolic sequence learning showed a greater improvement of their performance and a larger decrease in cognitive workload over time. Stronger reliance on chunking was also associated with higher scores in a WM updating task, suggesting the contribution of WM gating mechanisms to sequence chunking. Altogether, these results indicate that chunking is a cost-saving strategy that enhances effectiveness of symbolic sequence learning.

...read moreread less

19 citations

Proceedings Article•DOI•

Keyword extraction from Educational Video transcripts using NLP techniques

[...]

Himani Shukla¹, Misha Kakkar¹•Institutions (1)

Amity University¹

01 Jan 2016

TL;DR: A technique to extract keywords from educational video transcripts from MOOC's is discussed, based on Regular Expression Grammar Rule approach to identify the Noun Chunks in the text of the transcript.

...read moreread less

Abstract: Keyword Extraction is the most important task while working with the text data. Extracting Keywords benefit the reader as to judge the important part of text instead of going through the whole text. In this paper a technique to extract keywords from educational video transcripts from MOOC's is discussed. The technique is based on Regular Expression Grammar Rule approach to identify the Noun Chunks in the text of the transcript. Extracting Keywords help in finding out the specifically important part of the educational material.

...read moreread less

15 citations

Proceedings Article•DOI•

Weak Semi-Markov CRFs for Noun Phrase Chunking in Informal Text.

[...]

Aldrian Obaja Muis¹, Wei Lu²•Institutions (2)

Singapore University of Technology and Design¹, Chinese Academy of Sciences²

01 Jun 2016

14 citations

Journal Article•DOI•

Now or … later: Perceptual data are not immediately forgotten during language processing.

[...]

Klinton Bicknell¹, T. Florian Jaeger², Michael K. Tanenhaus²•Institutions (2)

Northwestern University¹, University of Rochester²

01 Jan 2016-Behavioral and Brain Sciences

TL;DR: It is proposed that language comprehenders must immediately compress perceptual data by “chunking” them into higher-level categories, knowing that effective language understanding, however, requires maintaining perceptual information long enough to integrate it with downstream cues.

...read moreread less

Abstract: Christiansen & Chater (C&C) propose that language comprehenders must immediately compress perceptual data by "chunking" them into higher-level categories. Effective language understanding, however, requires maintaining perceptual information long enough to integrate it with downstream cues. Indeed, recent results suggest comprehenders do this. Although cognitive systems are undoubtedly limited, frameworks that do not take into account the tasks that these systems evolved to solve risk missing important insights.

...read moreread less

10 citations

Book Chapter•DOI•

H-Plane: Intelligent Data Management for Mobile Healthcare Applications

[...]

Rahul Krishnan Pathinarupothi¹, Bithin Alangot¹, Maneesha Vinodini Ramesh¹, Krishnashree Achuthan¹, P. Venkat Rangan¹ - Show less +1 more•Institutions (1)

Amrita Vishwa Vidyapeetham¹

22 Aug 2016

TL;DR: An intelligent data management framework that can facilitate development of highly scalable and mobile healthcare applications for remote monitoring of patients through the use of a global log data abstraction that leverages the storage and processing capabilities of the edge devices and the cloud in a seamless manner is presented.

...read moreread less

Abstract: We present an intelligent data management framework that can facilitate development of highly scalable and mobile healthcare applications for remote monitoring of patients. This is achieved through the use of a global log data abstraction that leverages the storage and processing capabilities of the edge devices and the cloud in a seamless manner. In existing log based storage systems, data is read as fixed size chunks from the cloud to enhance performance. However, in healthcare applications, where the data access pattern of the end users differ widely, this approach leads to unnecessary storage and cost overheads. To overcome these, we propose dynamic log chunking. The experimental results, comparing existing fixed chunking against the H-Plane model, show 13 %–19 % savings in network bandwidth as well as cost while fetching the data from the cloud.

...read moreread less

9 citations

Journal Article•DOI•

Hierarchical Chunking of Sequential Memory on Neuromorphic Architecture with Reduced Synaptic Plasticity

[...]

Guoqi Li¹, Lei Deng¹, Dong Wang¹, Wei Wang², Fei Zeng¹, Ziyang Zhang¹, Huanglong Li¹, Sen Song¹, Jing Pei¹, Luping Shi¹ - Show less +6 more•Institutions (2)

Tsinghua University¹, Beihang University²

20 Dec 2016-Frontiers in Computational Neuroscience

TL;DR: This work builds a bio-plausible hierarchical chunking of sequential memory (HCSM) model, and uncovers that a chunking mechanism reduces the requirements of synaptic plasticity since it allows applying synapses with narrow dynamic range and low precision to perform a memory task.

...read moreread less

Abstract: Chunking refers to a phenomenon whereby individuals group items together when performing a memory task to improve the performance of sequential memory. In this work, we build a bio-plausible hierarchical chunking of sequential memory (HCSM) model to explain why such improvement happens. We address this issue by linking hierarchical chunking with synaptic plasticity and neuromorphic engineering. We uncover that a chunking mechanism reduces the requirements of synaptic plasticity since it allows applying synapses with narrow dynamic range and low precision to perform a memory task. We validate a hardware version of the model through simulation, based on measured memristor behavior with narrow dynamic range in neuromorphic circuits, which reveals how chunking works and what role it plays in encoding sequential memory. Our work deepens the understanding of sequential memory and enables incorporating it for the investigation of the brain-inspired computing on neuromorphic architecture.

...read moreread less

9 citations

Proceedings Article•DOI•

DTSim at SemEval-2016 Task 2: Interpreting Similarity of Texts Based on Automated Chunking, Chunk Alignment and Semantic Relation Prediction.

[...]

Rajendra Banjade¹, Nabin Maharjan¹, Nobal B. Niraula¹, Vasile Rus¹•Institutions (1)

University of Memphis¹

01 Jun 2016

TL;DR: The authors' system (DTSim) developed a Conditional Random Fields based chunker and applied rules blended with semantic similarity methods in order to predict chunk alignments, alignment types and similarity scores.

...read moreread less

Abstract: In this paper we describe our system (DTSim) submitted at SemEval-2016 Task 2: Interpretable Semantic Textual Similarity (iSTS). We participated in both gold chunks category (texts chunked by human experts and provided by the task organizers) and system chunks category (participants had to automatically chunk the input texts). We developed a Conditional Random Fields based chunker and applied rules blended with semantic similarity methods in order to predict chunk alignments, alignment types and similarity scores. Our system obtained F1 score up to 0.648 in predicting the chunk alignment types and scores together and was one of the top performing systems overall.

...read moreread less

Proceedings Article•DOI•

Modeling chunking effects on learning and performance using the Computational-Unified Learning Model (C-ULM): A multiagent cognitive process model

[...]

Duane F. Shell¹, Leen-Kiat Soh¹, Vlad Chiriacescu•Institutions (1)

University of Nebraska–Lincoln¹

01 Aug 2016

TL;DR: The computational formulation of chunking in the C-ULM is described, followed by results of simulation studies examining impacts of chunked versus no chunking on agent learning and agent effectiveness are examined.

...read moreread less

Abstract: Chunking has emerged as a basic property of human cognition. Computationally, chunking has been proposed as a process for compressing information also has been identified in neural processes in the brain and used in models of these processes. Our purpose in this paper is to expand understanding of how chunking impacts both learning and performance using the Computational-Unified Learning Model (C-ULM) a multi-agent computational model. Chunks in C-ULM long-term memory result from the updating of concept connection weights via statistical learning. Concept connection weight values move toward the accurate weight value needed for a task and a confusion interval reflecting certainty in the weight value is shortened each time a concept is attended in working memory and each time a task is solved, and the confusion interval is lengthened when a chunk is not retrieved over a number of cycles and each time a task solution attempt fails. The dynamic tension between these updating mechanisms allows chunks to come to represent the history of relative frequency of co-occurrence for the concept connections present in the environment; thereby encoding the statistical regularities in the environment in the long-term memory chunk network. In this paper, the computational formulation of chunking in the C-ULM is described, followed by results of simulation studies examining impacts of chunking versus no chunking on agent learning and agent effectiveness. Then, conclusions and implications of the work both for understanding human learning and for applications within cognitive informatics, artificial intelligence, and cognitive computing are discussed.

...read moreread less

Proceedings Article•DOI•

Throughput: A Key Performance Measure of Content-Defined Chunking Algorithms

[...]

Bertil Chapuis¹, Benoît Garbinato¹, Periklis Andritsos¹•Institutions (1)

University of Lausanne¹

27 Jun 2016

TL;DR: It is shown that the gain brought by algorithms that are aggressively focusing on DER often come at a significant cost in terms of throughput, and advocates for future optimizations taking throughput into account and for making balanced tradeoffs between DER and throughput.

...read moreread less

Abstract: Data deduplication techniques are often used by cloud storage systems to reduce network bandwidth and storage requirements. As a consequence, the current research literature tends to focus most of its algorithmic efforts on improving the Duplicate Elimination Ratio (DER), which reflects the compression achieved using a given algorithm. Yet, the importance of this indicator tends to be overestimated, while another key indicator, namely throughput, tends to be underestimated. To substantiate this claim, we reimplement a selection of popular Content-Defined Chunking algorithms (CDC) and perform a detailed performance analysis. On this basis, we show that the gain brought by algorithms that are aggressively focusing on DER often come at a significant cost in terms of throughput. As a consequence, we advocate for future optimizations taking throughput into account and for making balanced tradeoffs between DER and throughput.

...read moreread less

Proceedings Article•DOI•

UCDC: Unlimited Content-Defined Chunking, A File-Differing Method Apply to File-Synchronization among Multiple Hosts

[...]

Jihong Ma, Chongguang Bi¹, Yuebin Bai², Lijun Zhang•Institutions (2)

Michigan State University¹, Beihang University²

01 Aug 2016

TL;DR: This paper designs a new Unlimited Content-Defined Chunking (UCDC) algorithm, which contains file-chunking, file-comparing and file-merging, and evaluates the effectiveness of the UCDC by simulation example that produces the description of file.

...read moreread less

Abstract: Nowadays, the data centric system has been playing an increasingly important role in blogs sharing, content delivery and news broadcasting, file-synchronization, and so on. Due to generated amount of data within the system, data backup and archiving has become a main challenging task. A main methods to solve the problem is Chunking based deduplication by eliminating redundant data and reducing the total storage space. In this paper, we summarized several ways of file-differing, and then designs a new Unlimited Content-Defined Chunking (UCDC) algorithm, which contains file-chunking, file-comparing and file-merging. We evaluate the effectiveness of the UCDC by simulation example that produces the description of file.

...read moreread less

Book Chapter•DOI•

Gut, Besser, Chunker – Selecting the Best Models for Text Chunking with Voting

[...]

Balázs Indig¹, István Endrédy¹•Institutions (1)

Pázmány Péter Catholic University¹

03 Apr 2016

TL;DR: A less studied side of phrase chunking is investigated, i.e. the voting between different currently available taggers, the checking of invalid sequences and the way how the state-of-the-art method can be adapted to morphologically rich, agglutinative languages.

...read moreread less

Abstract: The CoNLL-2000 dataset is the de-facto standard dataset for measuring chunkers on the task of chunking base noun phrases (NP) and arbitrary phrases. The state-of-the-art tagging method is utilising TnT, an HMM-based Part-of-Speech tagger (POS), with simple majority voting on different representations and fine-grained classes created by lexcialising tags. In this paper the state-of-the-art English phrase chunking method was deeply investigated, re-implemented and evaluated with several modifications. We also investigated a less studied side of phrase chunking, i.e. the voting between different currently available taggers, the checking of invalid sequences and the way how the state-of-the-art method can be adapted to morphologically rich, agglutinative languages.

...read moreread less

Patent•

Smart chunking logic for chat persistence

[...]

Benjamin Wilde¹, Douglas L. Milvaney¹, Nikhil Nathwani¹•Institutions (1)

Microsoft¹

16 Jun 2016

TL;DR: In this paper, the authors present a system and methods for providing distinct conversations within a file activity feed for display on a user interface of a client computing device, where a file created with an application may be rendered on the user interface.

...read moreread less

Abstract: Aspects of the present disclosure relate to systems and methods for providing distinct conversations within a file activity feed for display on a user interface of a client computing device. A file created with an application may be rendered on the user interface. The file may include at least a chat pane comprising a plurality of chat messages and a file activity feed including one or more activities associated with the file. It may be determined when a distinct conversation begins and ends within the chat pane. The distinct conversation may include at least some of the plurality of chat messages. In response to determining when the distinct conversation begins and ends, the distinct conversation may be recorded as a distinct conversation activity associated with the file. The distinct conversation activity may be displayed within the file activity feed.

...read moreread less

Patent•

Dynamic streaming of query responses

[...]

Eric Vandenberg, Arjun Iyer

29 Apr 2016

TL;DR: In this article, the query is chunked or broken down into a sequence of smaller chunked queries and the chunked results of those smaller queries are streamed back to the requestor.

...read moreread less

Abstract: Instead of processing a query as-is, the query is chunked or broken down into a sequence of smaller chunked queries and the chunked results of those smaller queries are streamed back to the requestor. Chunking the query and streaming the chunked results can substantially decrease the user's time to value when running a query by returning some immediate results for display which are refined and eventually converge on the full results as each chunked query runs.

...read moreread less

Patent•

Conditional random field and transformative learning based Vietnamese chunking method

[...]

Yu Zhengtao, Liu Yanchao, Guo Jianyi

06 Jul 2016

TL;DR: In this article, a conditional random field and transformative learning-based Vietnamese chunking method is proposed to improve the chunking performance for Vietnamese sentences, which can be used for work such as phrase trees, semantic analysis, machine translation and the like.

...read moreread less

Abstract: The invention relates to a conditional random field and transformative learning based Vietnamese chunking method and belongs to the technical field of natural language processing. The method comprises the steps of firstly preprocessing Vietnamese corpora to obtain sentence level Vietnamese chunking training corpora; extracting the sentence level Vietnamese chunking training corpora from a database and performing chunking modeling on the sentence level Vietnamese chunking training corpora to obtain a Vietnamese chunking conditional random field model; obtaining a transformative mode set; and performing chunking marking on to-be-chunked Vietnamese sentence level test corpora through the established Vietnamese chunking conditional random field model and the obtained transformative mode set to obtain a Vietnamese chunking marking result. The method realizes effective chunking analysis for Vietnamese sentences and paves the way for work such as phrase trees, semantic analysis, machine translation and the like; and compared with an existing Vietnamese chunking tool, the Vietnamese chunking method is remarkably improved in accuracy, recall rate and F value.

...read moreread less

Book Chapter•DOI•

Chunking in Dependency Model and Spelling Correction in Russian and English

[...]

Ivan Anisimov¹, Elena Makarova², Vladimir Polyakov³•Institutions (3)

Yandex¹, Russian Academy of Sciences², National University of Science and Technology³

21 Sep 2016

TL;DR: The realization of the proposed model offers a new view over the task of automatic spelling correction and allows eliminating the possible alternatives generated by the system according to a morphological dictionary.

...read moreread less

Abstract: The present paper regards the syntactic support of spelling correction in Russian and English. The syntactic model used in the research is represented by chunking in dependency trees due to the fact that chunking has a great potential for the goal of our study. Particularly, it does not require a complete description of the syntactic model. Moreover, it allows eliminating the possible alternatives generated by the system according to a morphological dictionary. Thus, the realization of the proposed model offers a new view over the task of automatic spelling correction.

...read moreread less

Proceedings Article•DOI•

Part-of-speech tagging and chunking in text-to-speech synthesis for South African languages

[...]

Georg I. Schlunz¹, Nkosikhona Dlamini¹, Rynhardt P. Kruger•Institutions (1)

Council of Scientific and Industrial Research¹

08 Sep 2016

TL;DR: Presented in: Interspeech 2016, San Francisco, United States of America, 9 - 12 September 2016.

...read moreread less

Abstract: Presented in: Interspeech 2016, San Francisco, United States of America, 9 - 12 September 2016. 'For any article published in Interspeech proceedings, ISCA grants each author permission to use the article in that author's dissertation or in institutional repositories (paper and/or electronic versions), provided that the article is correctly referenced (including page numbers and/or paper number)'

...read moreread less

CCNx Content Object Chunking

[...]

Marc E. Mosko

01 Jun 2016

TL;DR: This document specifies a chunking protocol for dividing a user payload into CCNx Content Objects and specification for the naming convention to use for the chunked payload and a field added to a Content Object to represent the last chunk of an object.

...read moreread less

Abstract: This document specifies a chunking protocol for dividing a user payload into CCNx Content Objects. This includes specification for the naming convention to use for the chunked payload and a field added to a Content Object to represent the last chunk of an object.

...read moreread less

Journal Article•

A Plagiarized Source Retrieval System Developed using Efficient Download Filtering and POS Tagged Query Formulation with Effective Paragraph based chunking

[...]

Riya Ravi N, Deepa Gupta

27 Jan 2016-International journal of artificial intelligence

TL;DR: The proposed approach of Source Retrieval task of External Plagiarism Detection System includes chunking of documents based on paragraphs along with Part-of- Speech tagging and an efficient download filtering method which exhibited improved efficiency in PAN 2015 conducted by PAN CLEF Evaluation lab1.

...read moreread less

Abstract: Source Retrieval is an important task of External Plagiarism Detection system which involves in identifying a set of candidate source documents for a given suspicious document. Not to lose any actual source document while reducing the size of the candidate source document set is crucial. This paper describes the approach of Source Retrieval task of External Plagiarism Detection System. The approach includes chunking of documents based on paragraphs along with Part-of- Speech tagging and an efficient download filtering method. The proposed system is evaluated against PAN 2011-12, PAN 2012-13 PAN 2014-15 Test Data Set and results are analysed and compared using standard PAN measures: Recall, Precision, F Measure, average number of queries and downloads. The proposed approach exhibited improved efficiency in PAN 2015 conducted by PAN CLEF Evaluation lab1, by acquiring highest values for F Measure and Precision along with least Downloads. The results are further improved by incorporating efficient query and download filtering mechanisms over the proposed system. The effect of the enhanced proposed system is also discussed and analysed in this paper.

...read moreread less

Proceedings Article•

Cross-lingual transfer parser from Hindi to Bengali using delexicalization and chunking

[...]

Ayan Das¹, Agnivo Saha¹, Sudeshna Sarkar¹•Institutions (1)

Indian Institute of Technology Kharagpur¹

01 Dec 2016

TL;DR: A method to use chunkers to develop a cross-lingual parser for Bengali which results in an improvement of unlabelled attachment score (UAS) from 65.1 (baseline parser) to 78.2.

...read moreread less

Abstract: While statistical methods have been very effective in developing NLP tools, the use of linguistic tools and understanding of language structure can make these tools better. Cross-lingual parser construction has been used to develop parsers for languages with no annotated treebank. Delexicalized parsers that use only POS tags can be transferred to a new target language. But the success of a delexicalized transfer parser depends on the syntactic closeness between the source and target languages. The understanding of the linguistic similarities and differences between the languages can be used to improve the parser. In this paper, we use a method based on cross-lingual model transfer to transfer a Hindi parser to Bengali. The technique does not need any parallel corpora but makes use of chunkers of these languages. We observe that while the two languages share broad similarities, Bengali and Hindi phrases do not have identical construction. We can improve the transfer based parser if the parser is transferred at the chunk level. Based on this we present a method to use chunkers to develop a cross-lingual parser for Bengali which results in an improvement of unlabelled attachment score (UAS) from 65.1 (baseline parser) to 78.2.

...read moreread less

Book Chapter•DOI•

Experimental Study on Chunking Algorithms of Data Deduplication System on Large Scale Data

[...]

T. R. Nisha¹, S. Abirami¹, E. Manohar¹•Institutions (1)

Anna University¹

01 Jan 2016

TL;DR: This paper focuses on the experimental study on various chunking algorithms since chunking plays a very important role in data redundancy elimination system.

...read moreread less

Abstract: Data deduplication also known as data redundancy elimination is a technique for saving storage space. The data deduplication system is highly successful in backup storage environments. Large number of redundancies may exist in a backup storage environment. These redundancies can be eliminated by finding and comparing the fingerprints. This comparison of fingerprints may be done at the file level or splits the files to create chunks and done at the chunk level. The file level deduplication system leads poor results than the chunk level since it considers the entire file for finding hash value and eliminates only duplicate files. This paper focuses on the experimental study on various chunking algorithms since chunking plays a very important role in data redundancy elimination system.

...read moreread less

Proceedings Article•DOI•

Comparative analysis of deduplication techniques for enhancing storage space

[...]

Naresh Kumar¹, Preeti Malik¹, Sonam Bhardwaj¹, Sushil Chandra Jain²•Institutions (2)

Kurukshetra University¹, Rajasthan Technical University²

01 Jan 2016

TL;DR: Variable size chunking algorithms are the best suited deduplication techniques for the backup operation, based on different parameters.

...read moreread less

Abstract: Data is the most imperative part of any organization for their productive need or to make more profit. Rapid growth data with variations is solemn issue to handle or process. Data is generating at higher rate that data needs to be stored the databases with uniqueness. Deduplication is a technique to abolish the duplicated data from the databases and provides the backup of the data. In data deduplication numerous algorithm are feasible that basically detect a eliminate the redundant data and store unique copy of data contents. Various chunking techniques are used for t backup operations and to perform deduplication of the data. Various chunking techniques like Fixed sized chunking Whole file chunking and Content Defined Chunking are used for the data deduplication. Backup operation is achieved using these chunking techniques. These techniques are compared with each other for getting best suited technique for t backup job. In this paper we have presented performance evaluation of various deduplication techniques. Performan parameters matrix having parameters Deduplication ratio, Deduplication time, Hashing time, Chunking time a Throughput. The analysis result provides some guidelines to adopt the best deduplication techniques to clear away fro clone data. After comparing these chunking techniques based on different parameters it is concluded that variable size chunking algorithms are the best suited deduplication techniques for the backup operation.

...read moreread less

Proceedings Article•DOI•

Context aware dynamic log chunking for mobile healthcare applications: poster

[...]

Rahul Krishnan¹, Bithin Alangot¹, Venkat P. Rangan¹•Institutions (1)

Amrita Vishwa Vidyapeetham¹

05 Jul 2016

TL;DR: The application of this dynamic log chunking mechanism based on reader access pattern and domain specific data characteristics in the area of remote patient monitoring in bandwidth starved rural areas is shown to result in bandwidth and cost savings of 14% without affecting the prefetch performance.

...read moreread less

Abstract: Time series data from sensor devices are increasingly stored in log data structures across the cloud and mobile devices. Currently, log data is accessed as chunks of fixed size, which enhances performance by prefetching of data. However, in applications such as remote monitoring of patients using mobile devices, data requirement of end users varies significantly depending upon their roles. The fixed chunking approach would lead to unnecessary data download due to the dynamic variability of data access. Also, the requests are more often than not based on fixed time chunks that do not necessarily translate to fixed data size. To overcome this challenge, we present a dynamic log chunking mechanism based on reader access pattern and domain specific data characteristics. The application of this method in the area of remote patient monitoring in bandwidth starved rural areas is shown to result in bandwidth and cost savings of 14% without affecting the prefetch performance.

...read moreread less

DOI•

The Importance of Overflow and Chunking in World-Building and the Experiencing of Imaginary Worlds

[...]

Mark J. P. Wolf

08 Dec 2016

Chunking, Emergence, and Online Production as Theoretical Concepts in ELF

[...]

Ana Monika Habjan

29 Sep 2016

TL;DR: This paper compares and discusses notions such as chunking, intuition, emergent grammar and ad-hoc constructions, based on chosen texts from the respective fields of study.

...read moreread less

Abstract: Located at the intersection of applied linguistics and more formal language theory, this paper draws a parallel between concepts applied to grasp ELF and increasingly influential usage-based approaches to grammar. More precisely, I compare and discuss notions such as chunking , intuition , emergent grammar and ad-hoc constructions . The discussion is based on chosen texts from the respective fields of study. Basically, that is Sinclair and Mauranen’s book on Linear Unit Grammar and, the work of Joan Bybee.

...read moreread less