Showing papers on "Chunking (computing) published in 2013"

PDF

Open Access

Patent•

Unified local storage supporting file and cloud object access

[...]

11 Jan 2013

TL;DR: In this paper, a method and system for providing unified local storage support for file and cloud access is described, which comprises writing a chunk on a storage server, and replicating the chunk to other selected storage servers when necessary.

...read moreread less

Abstract: A method and system for providing unified local storage support for file and cloud access is disclosed. The method comprises writing a chunk on a storage server, and replicating the chunk to other selected storage servers when necessary. The method and system further comprise writing a version manifest on the storage server; replicating the version manifest to other selected storage servers when necessary. Object puts or appends are implemented by first chunking the object, determining if the chunks are new, transferring the chunks if required, followed by creation of a new version manifest referencing the chunks. Finally, the method and system include providing concurrent file-oriented read and write access consistent with the stored version manifests and chunks.

...read moreread less

70 citations

Journal Article•DOI•

On the other side: formulaic organizing chunks in spoken and written academic ELF

[...]

Ray Carey

12 Sep 2013-Journal of English as a lingua franca

TL;DR: This study investigates high-frequency organizing chunks in ELF corpora using the Linear Unit Grammar (LUG) framework and finds the lower frequency, organizing chunks showed a higher rate of approximation and number of unique forms, while the higher frequency chunks were primarily attested in conventional forms both in written and spoken ELF.

...read moreread less

Abstract: Abstract An ongoing discussion in ELF research is the ability of ELF speakers to store and retrieve holistic chunks of language, facilitating efficient and fluent production of speech. These questions involve the frequency effects of formulaic chunks of language and their varying degrees of entrenchment for ELF users. In addition, the variable forms in which these chunks may be attested can be treated as approximations of conventional chunks, while serving identical functions. This study addresses these issues by investigating high-frequency organizing chunks in ELF corpora using the Linear Unit Grammar (LUG) framework (Sinclair and Mauranen 2006). Drawing data from the ELFA corpus of spoken academic ELF, the study also considers organizing chunks in written academic ELF from the nascent WrELFA corpus. With ENL comparison data taken from the Michigan Corpus of Academic Spoken English (MICASE), findings are presented on the forms and frequencies of textual and interactive organizing chunks in ELF, with implications for the reality of frequency effects and their connection to distributions of approximated chunks. The lower frequency, organizing chunks showed a higher rate of approximation and number of unique forms, while the higher frequency chunks were primarily attested in conventional forms both in written and spoken ELF.

...read moreread less

56 citations

Journal Article•DOI•

Chunking: a procedure to improve naturalistic data analysis.

[...]

Marco Dozza¹, Jonas Bärgman¹, John D. Lee²•Institutions (2)

Chalmers University of Technology¹, University of Wisconsin-Madison²

01 Sep 2013-Accident Analysis & Prevention

TL;DR: A general procedure for the analysis of naturalistic driving data called chunking is presented that can support many of these analyses by increasing their robustness and sensitivity, and create a solid basis for further data analyses.

...read moreread less

36 citations

Journal Article•DOI•

Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce.

[...]

Boyu Zhang¹, Daniel T. Yehdego², Kyle L. Johnson², Ming-Ying Leung², Michela Taufer¹ - Show less +1 more•Institutions (2)

University UCINF¹, University of Texas at El Paso²

08 Nov 2013-BMC Structural Biology

TL;DR: This paper studies the prediction accuracy and efficiency of three different chunking methods using seven popular secondary structure prediction programs that apply to two datasets of RNA with known secondary structures, as well as a family of viral genome RNAs whose structures have not been predicted before.

...read moreread less

Abstract: Ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Their secondary structures are crucial for the RNA functionality, and the prediction of the secondary structures is widely studied. Our previous research shows that cutting long sequences into shorter chunks, predicting secondary structures of the chunks independently using thermodynamic methods, and reconstructing the entire secondary structure from the predicted chunk structures can yield better accuracy than predicting the secondary structure using the RNA sequence as a whole. The chunking, prediction, and reconstruction processes can use different methods and parameters, some of which produce more accurate predictions than others. In this paper, we study the prediction accuracy and efficiency of three different chunking methods using seven popular secondary structure prediction programs that apply to two datasets of RNA with known secondary structures, which include both pseudoknotted and non-pseudoknotted sequences, as well as a family of viral genome RNAs whose structures have not been predicted before. Our modularized MapReduce framework based on Hadoop allows us to study the problem in a parallel and robust environment. On average, the maximum accuracy retention values are larger than one for our chunking methods and the seven prediction programs over 50 non-pseudoknotted sequences, meaning that the secondary structure predicted using chunking is more similar to the real structure than the secondary structure predicted by using the whole sequence. We observe similar results for the 23 pseudoknotted sequences, except for the NUPACK program using the centered chunking method. The performance analysis for 14 long RNA sequences from the Nodaviridae virus family outlines how the coarse-grained mapping of chunking and predictions in the MapReduce framework exhibits shorter turnaround times for short RNA sequences. However, as the lengths of the RNA sequences increase, the fine-grained mapping can surpass the coarse-grained mapping in performance. By using our MapReduce framework together with statistical analysis on the accuracy retention results, we observe how the inversion-based chunking methods can outperform predictions using the whole sequence. Our chunk-based approach also enables us to predict secondary structures for very long RNA sequences, which is not feasible with traditional methods alone.

...read moreread less

21 citations

Journal Article•DOI•

Byte-index Chunking Algorithm for Data Deduplication System

[...]

Ider Lkhagvasuren¹, Jungmin So¹, Jeong-Gun Lee¹, Chuck Yoo, Young Woong Ko¹ - Show less +1 more•Institutions (1)

Hallym University¹

30 Sep 2013-International journal of security and its applications

TL;DR: An algorithm and structure for a deduplication method which can be efficiently used for eliminating identical data between files existing different machines with high rate and performing it within rapid time is presented.

...read moreread less

Abstract: This paper presents an algorithm and structure for a deduplication method which can be efficiently used for eliminating identical data between files existing different machines with high rate and performing it within rapid time. The algorithm predicts identical parts between source and destination files very fast, and then assures the identical parts and transfers only those parts of blocks that proved to be unique region. The fundamental aspect of reaching faster and high scalability determining duplicate result is that data are expressed as fixedsize block chunks which are distributed to “Index-table” by chunk’s both side boundary values. “Index-table” is a fixed sized table structure; chunk’s boundary byte values are used as their cell row and column numbers. Experiment result shows that the proposed solution enhances data deduplication performance and reduces data storage capacity extensively.

...read moreread less

10 citations

Book Chapter•DOI•

Adapt a Text-Oriented Chunker for Oral Data: How Much Manual Effort Is Necessary?

[...]

Isabelle Tellier¹, Yoann Dupont¹, Iris Eshkol², Ilaine Wang¹•Institutions (2)

University of Paris III: Sorbonne Nouvelle¹, University of Orléans²

20 Oct 2013

TL;DR: Three distinct approaches to chunk transcribed oral data with labeling tools learnt from a corpus of written texts are tried to reach the best possible results with the least possible manual correction or re-learning effort.

...read moreread less

Abstract: In this paper, we try three distinct approaches to chunk transcribed oral data with labeling tools learnt from a corpus of written texts. The purpose is to reach the best possible results with the least possible manual correction or re-learning effort.

...read moreread less

8 citations

Proceedings Article•DOI•

Incremental Rule Chunking for Problem Solving

[...]

Seng-Beng Ho¹, Fiona Liausvia¹•Institutions (1)

National University of Singapore¹

08 Sep 2013

TL;DR: This paper addresses the issues of how incrementally chunking learned action rules of increasing length and complexity can assist in solving problems of ever greater complexity by employing a micro-world with simple objects and simplified physical behaviors.

...read moreread less

Abstract: In this paper we address the issues of how incrementally chunking learned action rules of increasing length and complexity can assist in solving problems of ever greater complexity. To this end, we employ a micro-world with simple objects and simplified physical behaviors. The agent first learns some basic elemental rules capturing the fundamental physical behaviors of the agent itself, the objects and their interactions. Then, some moderately complex problems such as going from a start state to a goal state that do not require too many steps are given to the system and the system uses a standard search process (e.g., A) to find solutions which do not require too much search time because the problems are relatively simple. The solutions are then remembered as "chunked" rules of taking a sequence of actions to achieve a certain goal. Later, when a more complex problem - one that requires many steps to solve - is encountered, the chunked rules discovered earlier can be used to greatly reduce the search space by providing chunked sub-steps. Problem solving for complex problems without the chunking process would be impossible, as the search space would be combinatorially large.

...read moreread less

4 citations

"Asking questions,"Â "chunking,"Â and "sharing the progress"Â: How school librarians apply metacognitive strategies in their teaching

[...]

Kasey Garrison, Robin S. Spruce

01 Jan 2013

3 citations

QoS-aware packet chunking method for multimedia communications

[...]

Sawabe Anan, Tsukamoto Kazuya, Oie Yuji

07 Nov 2013

2 citations

Journal Article•DOI•

A Novel Frequency Based Chunking for Data Deduplication

[...]

Yun He Zhang¹, Wei Ling Wang¹, Ting Yin¹, Jiang Yuan¹•Institutions (1)

Huazhong University of Science and Technology¹

01 Jan 2013-Applied Mechanics and Materials

TL;DR: This work proposed a novel Improved Frequency Based Chunking algorithm for data de-duplication based on the FBC algorithm, and proved that the IFBC algorithm has a great improvement on the performance compared with the F BC algorithm.

...read moreread less

Abstract: Invoked by the thought of hierarchical substring caching, we proposed a novel Improved Frequency Based Chunking (called IFBC) algorithm for data de-duplication, based on the FBC algorithm proposed in Frequency Based Chunking for Data De-Duplication. Then we conducted a lot of experiments and proved that the IFBC algorithm has a great improvement on the performance compared with the FBC algorithm.

...read moreread less

2 citations

Proceedings Article•

Segmenting vs. Chunking Rules: Unsupervised ITG Induction via Minimum Conditional Description Length

[...]

Markus Saers¹, Karteek Addanki¹, Dekai Wu¹•Institutions (1)

Hong Kong University of Science and Technology¹

01 Sep 2013

TL;DR: This approach attacks the difficulty of acquiring more complex longer rules when inducing inversion transduction grammars via unsupervised bottom-up chunking, by augmenting its model search with top-down segmentation that minimizes CDL, resulting in significant translation accuracy gains.

...read moreread less

Abstract: We present an unsupervised learning model that induces phrasal inversion transduction grammars by introducing a minimum conditional description length (CDL) principle to drive search over a space defined by two opposing extreme types of ITGs. Our approach attacks the difficulty of acquiring more complex longer rules when inducing inversion transduction grammars via unsupervised bottom-up chunking, by augmenting its model search with top-down segmentation that minimizes CDL, resulting in significant translation accuracy gains. Chunked rules tend to be relatively short; long rules are hard to learn through chunking, as the smaller parts of the long rules may not necessarily be good translations themselves. Our objective criterion is a conditional adaptation of the notion of description length, that is conditioned on a fixed preexisting model, in this case the initial chunked ITG. The notion of minimum CDL (MCDL) facilitates a novel strategy for avoiding the pitfalls of premature pruning in chunking approaches, by incrementally splitting an ITG with reference to a second ITG that conditions this search.

...read moreread less

Apprentissage symbolique et statistique pour le chunking: comparaison et combinaisons

[...]

Isabelle Tellier, Yoann Dupont

01 Jan 2013

TL;DR: This paper proposes and evaluates two ways of combining a symbolic model and a statistical model learnt by a CRF, and shows that in both cases they benefit from one another.

...read moreread less

Abstract: Symbolic and statistical learning for chunking : comparison and combinations We describe in this paper how to use grammatical inference algorithms for chunking, then compare and combine them to CRFs (Conditional Random Fields) which are known efficient for this task. Our corpus is extracted from the FrenchTreebank. We propose and evaluate two ways of combining a symbolic model and a statistical model learnt by a CRF, and show that in both cases they benefit from one another.

...read moreread less

Journal Article•DOI•

Stride Static Chunking Algorithm for Deduplication System

[...]

Young Woong Ko¹, Ho Min Jung¹, Wan Yeon Lee², Min Ja Kim³, Chuck Yoo³ - Show less +1 more•Institutions (3)

Hallym University¹, Dongduk Women's University², Korea University³

01 Jul 2013-IEICE Transactions on Information and Systems

Symbolic and statistical learning for chunking : comparison and combinations (Apprentissage symbolique et statistique pour le chunking:comparaison et combinaisons) [in French])

[...]

Isabelle Tellier, Yoann Dupont

01 Jun 2013

TL;DR: L’utilisation d’algorithmes d”inférence grammaticale pour la tâche de chunking, pour ensuite les comparer et les combiner avec des CRF (Conditional Random Fields), à l’efficacité éprouvée pour cette tâches.

...read moreread less

Abstract: RÉSUMÉ Nous décrivons dans cet article l’utilisation d’algorithmes d’inférence grammaticale pour la tâche de chunking, pour ensuite les comparer et les combiner avec des CRF (Conditional Random Fields), à l’efficacité éprouvée pour cette tâche. Notre corpus est extrait du French TreeBank. Nous proposons et évaluons deux manières différentes de combiner modèle symbolique et modèle statistique appris par un CRF et montrons qu’ils bénéficient dans les deux cas l’un de l’autre.

...read moreread less

Journal Article•DOI•

Research on Distributional Stability of Chunk Sizes in Data Chunking

[...]

Wang Zhanjie, Shen Lang

15 Mar 2013-International Journal of Digital Content Technology and Its Applications

Proceedings Article•

Entailment: An Effective Metric for Comparing and Evaluating Hierarchical and Non-hierarchical Annotation Schemes

[...]

Rohan Ramanath¹, Monojit Choudhury², Kalika Bali²•Institutions (2)

Carnegie Mellon University¹, Microsoft²

01 Aug 2013

TL;DR: This work uses crowdsourcing to obtain query and sentence chunking and shows that entailment can not only be used as an effective evaluation metric to assess the quality of annotations, but it can also be employed to filter out noisy annotations.

...read moreread less

Abstract: Hierarchical or nested annotation of linguistic data often co-exists with simpler non-hierarchical or flat counterparts, a classic example being that of annotations used for parsing and chunking. In this work, we propose a general strategy for comparing across these two schemes of annotation using the concept of entailment that formalizes a correspondence between them. We use crowdsourcing to obtain query and sentence chunking and show that entailment can not only be used as an effective evaluation metric to assess the quality of annotations, but it can also be employed to filter out noisy annotations.

...read moreread less

Understanding Validity in Structuring Multi-Criteria Decision Problems

[...]

Konradin Maier

01 Jan 2013

TL;DR: This paper links the structuring of decision problems in MCDM to the theory of chunking, which describes how human cognition structures and perceives environmental information, and proposes that the validity of models representing multi-criteria decision problems can be assessed by evaluating the degree to which they match the structures formed by chunking.

...read moreread less

Abstract: The first steps of multi-criteria decision making (MCDM) are typically the decomposition and structuring of the decision problem at hand. As all subsequent process steps of MCDM are based on the initial structuring of the decision problem, the validity of the structure representing the decision problem is of particular importance for the quality of the decision making process. This paper seeks to further develop our understanding of validity in structuring multi-criteria decisions. For this purpose, we link the structuring of decision problems in MCDM to the theory of chunking, which describes how human cognition structures and perceives environmental information. Based on this, we propose that the validity of models representing multi-criteria decision problems can be assessed by evaluating the degree to which they match the structures formed by chunking. We discuss a preliminary framework of how the match between the cognitive and the MCDM model can be tested. To demonstrate how this framework can be utilized in research practice, we apply it to empirically show that algorithmic, bottom-up structuring of MCDM problems leads to valid goal-criteria hierarchies.

...read moreread less

Patent•

Systems and methods for parallel content-defined data chunking

[...]

Wenxin Wang¹, Xianbo Zhang¹, Dongxu Sun¹•Institutions (1)

Symantec¹

07 May 2013

TL;DR: In this article, a computer-implemented method for parallel content-defined data chunking may include identifying a data stream to be chunked, splitting the data stream into a plurality of data sub-streams by alternatingly dividing consecutive bytes of the data streams among the plurality of substreams.

...read moreread less

Abstract: A computer-implemented method for parallel content-defined data chunking may include (1) identifying a data stream to be chunked, (2) splitting the data stream into a plurality of data sub-streams by alternatingly dividing consecutive bytes of the data stream among the plurality of data sub-streams, and (3) chunking, in parallel, each data sub-stream within the plurality of data sub-streams into a plurality of data segments using a content-defined chunking algorithm. Various other methods, systems, and computer-readable media are also disclosed.

...read moreread less

A Memory Based Learning Method for Chunking and Concatenating Primitive Motor Behaviors

[...]

박준철, 전재영, 장윤훈, 김대식

01 Jan 2013

The Nature of Cognitive Chunking Processes in Rat Serial Pattern Learning

[...]

Karen E. Doyle¹•Institutions (1)

Kent State University¹

01 Jan 2013

Proceedings Article•DOI•

Selective chunking — Easy and effective way to estimate text similarity

[...]

Tomas Kucecka¹, Daniela Chudá¹, Patrik Samuhel¹•Institutions (1)

Slovak University of Technology in Bratislava¹

01 Nov 2013

TL;DR: This paper proposes an approach that should faster and better identify copied fragments of text data than standard approaches for plagiarism detection and first identifies topic related pairs of text documents and then select those pairs on further processing that discuss similar topic.

...read moreread less

Abstract: Plagiarism is a serious problem especially in academic environment Basically we define this problem as a theft of stealing somebody else's work or ideas In this paper we focus on plagiarism in a domain of student assignments written in natural language We propose an approach that should faster and better identify copied fragments of text data than standard approaches We first identify topic related pairs of text documents and then select those pairs on further processing that discuss similar topic We experimented with usage of different chunking methods in the comparison process to overcome typical problems as shorter fragments of text copied from other documents The results show that our approach is more suitable for plagiarism detection as a standard n-gram method

...read moreread less

Journal Article•DOI•

Data Deduplication Method using Locality-based Chunking policy for SSD-based Server Storages

[...]

Seung-Kyu Lee, Ju-Kyeong Kim, Deok-Hwan Kim

25 Feb 2013-Journal of the Institute of Electronics Engineers of Korea

TL;DR: Wang et al. as discussed by the authors proposed an adaptive chunking method based on application locality and file name locality of written data in SSD-based server storage, which can reduce the overhead of chunking and hash key generation and prevent duplicated data writing.

...read moreread less

Abstract: NAND flash-based SSDs (Solid State Drive) have advantages of fast input/output performance and low power consumption so that they could be widely used as storages on tablet, desktop PC, smart-phone, and server. But, SSD has the disadvantage of wear-leveling due to increase of the number of writes. In order to improve the lifespan of the SSD, a variety of data deduplication techniques have been introduced. General fixed-size splitting method allocates fixed size of chunk without considering locality of data so that it may execute unnecessary chunking and hash key generation, and variable-size splitting method occurs excessive operation since it compares data byte-by-byte for deduplication. This paper proposes adaptive chunking method based on application locality and file name locality of written data in SSD-based server storage. The proposed method split data into 4KB or 64KB chunks adaptively according to application locality and file name locality of duplicated data so that it can reduce the overhead of chunking and hash key generation and prevent duplicated data writing. The experimental results show that the proposed method can enhance write performance, reduce power consumption and operation time compared to existing variable-size splitting method and fixed size splitting method using 4KB.

...read moreread less

Proceedings Article•

Prosodic chunking of German as a foreign language.

[...]

Hansjörg Mixdorff¹, Hamurabi Gamboa Rosales²•Institutions (2)

Beuth University of Applied Sciences Berlin¹, Autonomous University of Zacatecas²

01 Jan 2013

TL;DR: It is found that the acoustic properties of the syllables had a larger impact on the non-learners’ decisions since they could not operate on linguistic knowledge of German, and Chinese and Mexican nonlearners show a preference to mark an accent when the syllable is followed by a word boundary.

...read moreread less

Abstract: This study concerns the perception of boundaries and accented syllables by native German subjects as compared to foreign non-speakers and learners of the language at different proficiency levels. To this effect six-syllable sequences excised from a context of three poly-syllabic words of German were presented to participants who had to select the syllables they perceived as accented, as well as the locations of word boundaries. Results show that German native subjects perform well at the word boundary task, but mark correctly less than two thirds of accented syllables. Chinese and Mexican nonlearners still detect a considerable number of word boundaries and accented syllables. Learners of German show improvement at the task with growing experience though they often pick legal subword units that do not necessarily form a plausible sequence. Correlation analysis of factors for syllable and boundary selection performed for non-learners and German subjects – as expected – shows considerably different behaviours. Whereas the boundary location does not influence the Germans’ decision on the accent location, Chinese and Mexican non-learners show a preference to mark an accent when the syllable is followed by a word boundary. We also found that the acoustic properties of the syllables had a larger impact on the non-learners’ decisions since they could not operate on linguistic knowledge of German.

...read moreread less

Understanding flow and chunking constructs in self service technologies

[...]

Jacques Esterhuyse

01 Jan 2013

Patent•

Solution for broadband mobile overage charging and bandwidth issues

[...]

Thomas J. McWilliams¹•Institutions (1)

Wilmington University¹

12 Apr 2013

TL;DR: In this paper, the authors propose a method for the chunking of the data and the delivery of high bandwidth chunks to a requesting user at times that are more convenient for the network.

...read moreread less

Abstract: The present invention provides for the chunking of the data, and the delivery of high bandwidth chunks to a requesting user at times that are more convenient for the network.

...read moreread less