Showing papers on "Chunking (computing) published in 2019"

PDF

Open Access

Proceedings Article•DOI•

Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets

[...]

University of Illinois at Urbana–Champaign¹

12 Sep 2019

TL;DR: Effectiveness of multi-dataset-multi-task learning in training neural models for four sequence tagging tasks for Twitter data, namely, part of speech tagging, chunking, super sense tagging, and named entity recognition is studied.

...read moreread less

Abstract: Multi-task learning is effective in reducing the required data for learning a task, while ensuring competitive accuracy with respect to single task learning. We study effectiveness of multi-dataset-multi-task learning in training neural models for four sequence tagging tasks for Twitter data, namely, part of speech (POS) tagging, chunking, super sense tagging, and named entity recognition (NER). We utilize -- 7 POS, 10 NER, 1 Chunking, and 2 super sense -- tagged publicly available datasets. We use a multi-dataset-multi-task neural model based on pre-trained contextual text embeddings and compare it against single-dataset-single-task, and multi-dataset-single-task models. Even within a task, the tagging schemes may differ across datasets. The model learns using this tagging diversity across all datasets for a task. The models are more effective compared to single data/task models, leading to significant improvements for POS (1-2% acc., 7 datasets), NER (1-10% F1, 9 datasets), and chunking (4%). For super sense tagging there is 2% improvement in F1 for out of domain data. Our models and tools can be found at https://socialmediaie.github.io/

...read moreread less

19 citations

Journal Article•DOI•

Sequential patterns in SMS and WhatsApp dialogues: Practices for coordinating actions and managing topics:

[...]

Katharina König¹•Institutions (1)

University of Münster¹

13 Aug 2019-Discourse & Communication

TL;DR: This article shows that in German WhatsApp dialogues, users apply a chronological as well as a reversed ordering of SPPs in order to foreground particular topics in extended, chat-like dialogues.

...read moreread less

Abstract: In computer-mediated communication, users cannot ensure that responsive postings are placed in a directly adjacent position. Yet, paired actions are discernible in which a first pair part (FPP) mak...

...read moreread less

16 citations

Proceedings Article•DOI•

SS-CDC: a two-stage parallel content-defined chunking for deduplicating backup storage

[...]

Fan Ni¹, Xing Lin, Song Jiang¹•Institutions (1)

University of Texas at Arlington¹

22 May 2019

TL;DR: SS-CDC is proposed, a two-stage parallel CDC that enables (almost) full parallelism on chunking of a file without compromising deduplication ratio and exploits instruction-level SIMD parallelism available in today's processors.

...read moreread less

Abstract: Data deduplication has been widely used in storage systems to improve storage efficiency and I/O performance. In particular, content-defined variable-size chunking (CDC) is often used in data deduplication systems for its capability to detect and remove duplicate data in modified files. However, the CDC algorithm is very compute-intensive and inherently sequential. Efforts on accelerating it by segmenting a file and running the algorithm independently on each segment in parallel come at a cost of substantial degradation of deduplication ratio. In this paper, we propose SS-CDC, a two-stage parallel CDC, that enables (almost) full parallelism on chunking of a file without compromising deduplication ratio. Further, SS-CDC exploits instruction-level SIMD parallelism available in today's processors. As a case study, by using Intel AVX-512 instructions, SS-CDC consistently obtains superlinear speedups on a multi-core server. Our experiments using real-world datasets show that, compared to existing parallel CDC methods which only achieve up to a 7.7X speedup on an 8-core processor with the deduplication ratio degraded by up to 40%, SS-CDC can achieve up to a 25.6X speedup with no loss of deduplication ratio.

...read moreread less

15 citations

Journal Article•DOI•

A constant-time chunking algorithm for packet-level deduplication

[...]

MyungKeun Yoon¹•Institutions (1)

Kookmin University¹

01 Jun 2019-ICT Express

TL;DR: This paper presents the first constant-time chunking algorithm that divides every packet into a predefined number of chunks, irrespective of the packet size, and presents the best implementation practice for packet-level deduplication by selecting an optimal combination of chunking, fingerprinting, and hash table algorithms.

...read moreread less

14 citations

Journal Article•DOI•

Translating in fits and starts: pause thresholds and roles in the research of translation processes

[...]

Ricardo Muñoz Martín¹, José Mª. Cardona Guerra²•Institutions (2)

University of Las Palmas de Gran Canaria¹, University of Bologna²

04 Jul 2019-Perspectives-studies in Translatology

TL;DR: Two pause thresholds were tested, aimed at chunking the translation task workflow into task segments and classifying pauses into different kinds of pauses, and found that pauses below 200 ms were dubbed delays and pauses above that level were called pauses.

...read moreread less

Abstract: Two pause thresholds were tested, aimed at chunking the translation task workflow into task segments and classifying pauses into different kinds. Pauses below 200 ms were dubbed delays and excluded...

...read moreread less

11 citations

Book Chapter•DOI•

Efficient Data Deduplication for Big Data Storage Systems

[...]

Naresh Kumar¹, Shobha¹, Sushil Chandra Jain²•Institutions (2)

Kurukshetra University¹, Rajasthan Technical University²

01 Jan 2019

TL;DR: AE significantly improves chunking throughput by using local extreme value in a variable-sized asymmetric window to overcome Rabin and TTTD boundaries shift problem, while achieving nearby same deduplication ratio (DR).

...read moreread less

Abstract: For efficient chunking, we propose Differential Evolution (DE) based approach which is optimized Two Thresholds Two Divisors (TTTD-P) Content Defined Chunking (CDC) to reduce the number of computing operations using single dynamic optimal parameter divisor D with optimal threshold value exploiting multi-operations nature of TTTD. To reduce chunk size variance, TTTD algorithm introduces an additional backup divisor D′ that has a higher probability of finding cut points, however, adding an additional divisor decreases chunking throughput. To this end, Asymmetric Extremum (AE) significantly improves chunking throughput by using local extreme value in a variable-sized asymmetric window to overcome Rabin and TTTD boundaries shift problem, while achieving nearby same deduplication ratio (DR). Therefore, we propose DE-based TTTD-P optimized chunking to maximize chunking throughput with increased DR; and scalable bucket indexing approach reduces hash values judgment time to identify and declare redundant chunks about 16 times than Rabin CDC, 5 times than AE CDC, 1.6 times than FAST CDC on Hadoop Distributed File System (HDFS).

...read moreread less

10 citations

Journal Article•DOI•

A usage-based account of subextraction effects

[...]

Rui P. Chaves, Adriana King

26 Nov 2019-Cognitive Linguistics

TL;DR: Experimental evidence is provided suggesting that frames also play a role in explaining certain long-distance dependency phenomena, as originally proposed by Deane (1991), and that complex structures can evoke complex frames as well, if sufficiently frequent and semantically coherent, and therefore more easily license deeper subextractions.

...read moreread less

Abstract: Abstract The idea that conventionalized general knowledge – sometimes referred to as a frame – guides the perception and interpretation of the world around us has long permeated various branches of cognitive science, including psychology, linguistics, and artificial intelligence. In this paper we provide experimental evidence suggesting that frames also play a role in explaining certain long-distance dependency phenomena, as originally proposed by Deane (1991). We focus on a constraint that restricts the extraction of an NP from another NP, called subextraction, which Deane (1991) claims is ultimately a framing effect. In Experiment 1 we provide evidence showing that referents are extractable to the degree that they are deemed important for the proposition expressed by the utterance. This suggests that the world knowledge that the main verb evokes plays a key role in establishing which referents are extractable. In Experiment 2 we offer evidence suggesting that the acceptability of deep subextractions is correlated with the overall plausibility of the proposition, suggesting that complex structures can evoke complex frames as well, if sufficiently frequent and semantically coherent, and therefore more easily license deeper subextractions.

...read moreread less

9 citations

Journal Article•DOI•

Characterisation of cut and chip behaviour for NR, SBR and BR compounds with an instrumented laboratory device

[...]

Radek Stoček¹, William V. Mars, Reinhold Kipscholl, Christopher G. Robertson•Institutions (1)

Tomas Bata University in Zlín¹

02 Jan 2019-Plastics Rubber and Composites

TL;DR: In this paper, the cut and chip effect in rubber is studied and the authors propose a method to understand the effect of the cut-and-chip effect on rubber and apply it to the development of new products for tires used in off-road or poor road conditions.

...read moreread less

Abstract: Understanding the cut and chip (CC) effect in rubber is important for successful product development for tires used in off-road or poor road conditions and for other demanding applications of rubbe...

...read moreread less

9 citations

Book Chapter•DOI•

The Pangeo Ecosystem: Interactive Computing Tools for the Geosciences: Benchmarking on HPC.

[...]

Tina Odaka¹, Anderson Banihirwe², Guillaume Eynard-Bontemps³, Aurélien Ponte¹, Guillaume Maze¹, Kevin Paul², Jared Baker², Ryan Abernathey⁴ - Show less +4 more•Institutions (4)

IFREMER¹, National Center for Atmospheric Research², Centre National D'Etudes Spatiales³, Lamont–Doherty Earth Observatory⁴

11 Nov 2019

TL;DR: The Pangeo ecosystem as mentioned in this paper is an interactive computing software stack for HPC and public cloud infrastructures, which is used for geoscience operations on two different HPC systems.

...read moreread less

Abstract: The Pangeo ecosystem is an interactive computing software stack for HPC and public cloud infrastructures. In this paper, we show benchmarking results of the Pangeo platform on two different HPC systems. Four different geoscience operations were considered in this benchmarking study with varying chunk sizes and chunking schemes. Both strong and weak scaling analyses were performed. Chunk sizes between 64 MB to 512 MB were considered, with the best scalability obtained for 512 MB. Compared to certain manual chunking schemes, the auto chunking scheme scaled well.

...read moreread less

6 citations

Journal Article•DOI•

Associative memory and recall model with KID model for human activity recognition

[...]

Runhe Huang¹, Peter Kimani Mungai¹, Jianhua Ma¹, Kevin I-Kai Wang²•Institutions (2)

Hosei University¹, University of Auckland²

01 Mar 2019-Future Generation Computer Systems

TL;DR: An associative memory and recall (AMR) model that stores associative knowledge from sensor data is proposed that can organize human activity knowledge in the manner that is efficient and effective to store and recall.

...read moreread less

5 citations

Effects of the Use of Lexical Chunks on Practical English Proficiency

[...]

Nahk Bohk Kim

01 Mar 2019

TL;DR: This article explored the effects of utilizing lexical chunks in individualized coaching on university students' practical English skills and found that lexical chunk usage had a positive effect on students' performance.

...read moreread less

Abstract: The present case study explored the effects of utilizing lexical chunks in individualized coaching on university students’ practical English...

...read moreread less

Journal Article•DOI•

Chunking Phrase to Predict Pause Break in Pontianak Malay Language

[...]

Arif Bijaksana Putra Negara, Yulia Magdalena, Rudy Dwi Nyoto, Herry Sujaini

30 Dec 2019

TL;DR: The new grammar rule and pause break rule from this research have a better prediction accuracy than the earlier research with the correct predictive value of sentences increasing by 23% from the earlier rule.

...read moreread less

Abstract: Pause break is one of the indicators of speech to be easily understood in the Text-to-Speech System. This research aims to improve the accuracy of pause prediction in Pontianak Malay Language Sentences based on earlier research using a chunking phrase. This research is done as one of the efforts to preserve Pontianak Malay Language in order not to become extinct as a local language. Chunking method uses RegexpParser function in Natural Language Toolkit to crop sentences into phrases based on the Part of Speech type. In this research, the authors have developed a new grammar and pause break rule that is different from the earlier research to increase the accuracy of pause prediction. The data used is 500 Pontianak Malay Language sentences that have been recorded by a Pontianak Malay Language native speaker to get the pause break analysis. The pause consists of a short pause (symbolized as “/1) and a long pause (symbolized as “/2”). The tests were a test of pause break compatibility in one sentence and a test using f-measure, recall, and precision parameters. Based on the tests that have been done, the new grammar rule and pause break rule from this research have a better prediction accuracy than the earlier research with the correct predictive value of sentences increasing by 23% from the earlier rule.

...read moreread less

Journal Article•DOI•

Dataset of implicit sequence learning of chunking and abstract structures

[...]

Qiufang Fu¹, Huiming Sun, Zoltan Dienes², Xiaolan Fu¹•Institutions (2)

Chinese Academy of Sciences¹, University of Sussex²

01 Feb 2019-Data in Brief

TL;DR: This article describes the data analyzed in the paper "Implicit sequence learning of chunking and abstract structures" and includes reaction times in the serial reaction time task and generation proformance for each confidence rating or attribution under the inclusion and exclusion tests from three experiments.

...read moreread less

Proceedings Article•

Statistically-based chunking of nonadjacent dependencies

[...]

Erin S. Isbilen¹, Rebecca Louise Ann Frost², Padraic Monaghan³, Morten H. Christiansen¹•Institutions (3)

Cornell University¹, Max Planck Society², Lancaster University³

01 Jan 2019

TL;DR: The results support the notion that the segmentation and generalization of linguistic structure occurs in parallel, using similar computations, and that chunked representations of the nonadjacent dependencies are flexible enough to accommodate novel instances.

...read moreread less

Abstract: s: Nonadjacent dependencies are dependencies between linguistic units that occur over one or more variable intervening units (e.g., AXC, where units A and C reliably co-occur, but the identity of X varies). These dependencies are a common feature of many natural languages, and are acquired by both infants and adults using statistical learning. However, despite the wealth of studies examining the acquisition on nonadjacent dependencies, a number of outstanding debates about this form of learning remain. For example, it is unclear whether participants in nonadjacent dependency experiments have learned the relative positions of syllables in these sequences (Endress & Bonnatti, 2007), or if they remember specific items from the input (Perruchet, Tyler, Galland & Peeremen, 2004). Moreover, substantial debate exists as to whether the segmentation and generalization of structure are two distinct processes that rely on separate computations (Peña, Bonatti, Nespor & Mehler, 2002), or whether they occur in tandem, using the same statistical learning computations (Frost & Monaghan, 2016). Here, we investigate these questions by testing the segmentation and generalization of nonadjacent dependencies in adults. We hypothesized that chunking – which has been shown to account for the statistical learning of adjacent dependencies (Isbilen, McCauley, Kidd & Christiansen, 2017) – may also play a role in the acquisition of nonadjacent dependencies (Isbilen, Frost, Monaghan & Christiansen, 2018). Following the method of Frost and Monaghan (2016), participants were presented with an artificial language composed of three nonadjacent dependencies. Following exposure, participants’ ability to segment and generalize these structures was tested using two different tasks: a two-alternative forcedchoice task (2AFC), and the statistically-induced chunking recall task (SICR; Isbilen et al., 2017). We predicted that while both tasks would show evidence of learning, SICR may provide clearer insights into the resulting output representations of learning. Our results confirm that participants successfully segmented and generalized nonadjacent structures on both types of task. However, while 2AFC performance on the generalization trials was significantly lower than on the segmentation trials, the results of SICR revealed no difference between the two, suggesting that the difference between segmentation and generalization found in previous studies may in part stem from the task demands of 2AFC (i.e., making familiarity judgements), rather than differences in learning. Taken together, our results support the notion that the segmentation and generalization of linguistic structure occurs in parallel, using similar computations, and that chunked representations of the nonadjacent dependencies are flexible enough to accommodate novel instances. The Seventh Conference of the Scandinavian Association for Language and Cognition Aarhus University, May 22 – 24, 2019

...read moreread less

Journal Article•DOI•

Resolving the Electroencephalographic Correlates of Rapid Goal-Directed Chunking in the Frontal-Parietal Network.

[...]

Jiaoyan Pang¹, Jiaoyan Pang², Xiaochen Tang², Qi-Yang Nie³, Markus Conci³, Peng Sun⁴, Haibin Wang⁵, Junlong Luo⁶, Jijun Wang², Chunbo Li², Jing Luo⁷, Jing Luo⁸ - Show less +8 more•Institutions (8)

Shanghai University of Political Science and Law¹, Shanghai Jiao Tong University², Ludwig Maximilian University of Munich³, Shandong University of Finance and Economics⁴, Huangshan University⁵, Shanghai Normal University⁶, Chinese Academy of Sciences⁷, Capital Normal University⁸

19 Jul 2019-Frontiers in Neuroscience

TL;DR: It is suggested that the posterior rhythmic activities in the gamma band may underlie the processes that are directly associated with perceptual manipulations of chunking, while the subsequent beta-gamma activation over frontal areas appears to reflect a post-evaluation process such as reinforcement of the selected rules over alternative solutions, which may be an important characteristic of goal-directed chunking.

...read moreread less

Abstract: Previous studies have revealed a specific role of the prefrontal-parietal network in rapid goal-directed chunking (RGDC), which dissociates prefrontal activity related to chunking from parietal working memory demands. However, it remains unknown how the prefrontal and parietal cortices collaborate to accomplish RGDC. To this end, a novel experimental design was used that presented Chinese characters in a chunking task, testing eighteen undergraduate students (9 females, mean age = 22.4 years) while recoding the electroencephalogram (EEG). In the experiment, radical-level chunking was accomplished in a timely stringent way (RT = 1485 ms, SD = 371 ms), whereas the stroke-level chunking was accomplished less coherently (RT = 3278 ms, SD = 1083 ms). By comparing the differences between radical-level chunking vs. stroke-level chunking, we were able to dissociate the chunking processes in the radical-level chunking condition within the analyzed time window (-200 to 1300 ms). The chunking processes resulted in an early increase of gamma band synchronization over parietal and occipital cortices, followed by enhanced power in the beta-gamma band (25-38 Hz) over frontal areas. We suggest that the posterior rhythmic activities in the gamma band may underlie the processes that are directly associated with perceptual manipulations of chunking, while the subsequent beta-gamma activation over frontal areas appears to reflect a post-evaluation process such as reinforcement of the selected rules over alternative solutions, which may be an important characteristic of goal-directed chunking.

...read moreread less

Journal Article•

Two-Threshold Chunking (TTC): Efficient Chunking Algorithm For Data Deduplication For Backup Storage

[...]

Anand Bhalerao, Ambika Pawar

25 Sep 2019-International Journal of Scientific & Technology Research

TL;DR: This paper presents the design of an efficient chunking algorithm to achieve high throughput and to reduce processing time and that can be improved using deduplication and variable size chunking.

...read moreread less

Abstract: Large amount of data gets generated every day and storing that data efficiently becomes a heuristic task. Backup storages are more prominently used media for storing every day, the generated data. The significant amount of data that is stored in the backup storage is redundant and leads to the wastage of storage space. Storage space can be saved and processing speed of backup media can be improved using deduplication and variable size chunking. Various chunking algorithms have been presented in the past to improve deduplication process. This paper presents the design of an efficient chunking algorithm to achieve high throughput and to reduce processing time.

...read moreread less

Proceedings Article•

A Fast SIMD-Based Chunking Algorithm.

[...]

Yehonatan Dude, Michael Hirsch¹, Yair Toaff¹•Institutions (1)

IBM¹

01 Jan 2019

TL;DR: This work presents a way to calculate n weak rolling hashes at a time using single instruction multiple data (SIMD) instructions available on today’s processors and shows how to calculate chunk boundaries cheaply using other instructions also available on these processors.

...read moreread less

Abstract: Deduplication is a special case of data compression where repeated chunks of data are stored only once. The input data is divided into chunks using a chunking algorithm and a cryptographically strong hash is calculated on each chunk and used as its unique identifier for further searching and duplicate elimination. As the input stream is processed, a chunk boundary is declared at a byte address in the input stream if some weak hash of a fixed number of preceding bytes (the “hash window”) satisfies some criterion. Commonly, a rolling hash like Karp-Rabin [6] or some cyclic polynomial [7] is used for the weak hash since these cheaply support moving the hash window forward one byte in the input stream. This work presents a way to calculate n weak rolling hashes at a time using single instruction multiple data (SIMD) instructions available on today’s processors. Furthermore, it shows how to calculate chunk boundaries cheaply using other instructions also available on these processors. Empirical results show that the proposed algorithm is four times as fast as previous algorithms, and that these optimizations save up to 25% of the computation required for deduplication.

...read moreread less

Patent•

Systems and methods of chunking data for secure data storage across multiple cloud providers

[...]

Talukdar Rohit, Gudipudi Krishna Mohan, Mathew Sudeep Abraham

01 Aug 2019

TL;DR: In this paper, a chunking engine and a policy engine are employed to evaluate one or more storage policies relating to, for example, cost, security, and network conditions in view of services and/or requirements of the multiple cloud storage providers.

...read moreread less

Abstract: Techniques for chunking data in data storage systems that provide increased data storage security across multiple cloud storage providers. The techniques employ a chunking engine and a policy engine, which evaluates one or more storage policies relating to, for example, cost, security, and/or network conditions in view of services and/or requirements of the multiple cloud storage providers. Having evaluated such storage policies, the policy engine generates and provides operating parameters to the chunking engine, which uses the operating parameters when chunking and/or distributing the data across the multiple cloud storage providers, thereby satisfying the respective storage policies. In this way, users of data storage systems obtain the benefits of cloud storage resources and/or services while reducing their data security concern and optimizing the total cost of data storage.

...read moreread less

Patent•

Lightweight complexity based packet-level deduplication apparatus and method storage media storing the same

[...]

Myung Keun Yoon, Ji Man Jung

09 May 2019

TL;DR: In this article, the authors proposed a lightweight complexity-based packet-level deduplication apparatus, which consists of a chunk dividing unit for performing an N-way chunking operation on a specific packet and dividing the chunk into Nway chunks.

...read moreread less

Abstract: The present invention relates to a lightweight complexity-based packet-level deduplication apparatus and a method thereof. The lightweight complexity-based packet-level deduplication apparatus comprises: a chunk dividing unit for performing an N-way chunking operation on a specific packet and dividing the chunk into N-way chunks; a chunk extracting unit for extracting at least one target chunk used for deduplication among the N-way chunks; and deduplication processing unit for determining duplication of the specific packet based on the at least one target chunk and performing deduplication. Accordingly, a network bandwidth can be saved by removing a duplicated portion of a packet at packet-level.

...read moreread less

Morphological word chunking vs syllable types: understanding the effectiveness of two approaches to polysyllabic word reading instruction for middle school struggling readers

[...]

Perry Franklin Louden

01 Jan 2019

Patent•

Data statement chunking

[...]

Gerweck Sarah, David Ross, Drummond Daren

13 Jun 2019

TL;DR: In this paper, the authors present techniques for applying fine-grained client-specific rules to divide (e.g., chunk) data statements to achieve cost reduction and failure rate reduction associated with executing the data statements over a subject dataset.

...read moreread less

Abstract: Techniques are presented for applying fine-grained client-specific rules to divide (e.g., chunk) data statements to achieve cost reduction and/or failure rate reduction associated with executing the data statements over a subject dataset. Data statements for the subject dataset are received from a client. Statement attributes derived from the data statements are processed with respect to fine-grained rules and/or other client-specific data to determine whether a data statement chunking scheme is to be applied to the data statements. If a data statement chunking scheme is to be applied, further analysis is performed to select a data statement chunking scheme. A set of data operations are generated based at least in part on the selected data statement chunking scheme. The data operations are issued for execution over the subject dataset. The results from the data operations are consolidated in accordance with the selected data statement chunking scheme and returned to the client.

...read moreread less

Journal Article•DOI•

Aplicar el concepto del agrupamiento (chunking) en el tenis

[...]

E. Paul Roetert, Ronald B. Woods, Duane Knudson, Scott W. Brown

30 Apr 2019-Sport Science Review

TL;DR: In this paper, the authors address how three accepted and researched motor learning stages, as well as the concept of mentally chunking information, relate to acquiring and accelerating the learning process in tennis.

...read moreread less

Abstract: The goal of this article is to address how three accepted and researched motor learning stages, as well as the concept of mentally chunking information, relate to acquiring and accelerating the learning process in tennis. Stages of learning, the role of playing vs. practicing tennis, and the interaction between biomechanics and motor learning are discussed. Specific coaching tips are provided.

...read moreread less

Towards high-bandwidth computer-human understanding: cognitive chunking theory and microbehavioural signals

[...]

Peter C.-H. Cheng

30 Jun 2019

TL;DR: Chunking theory from cognitive science provides a basis for analyzing micro-behaviours in human performance in order to build models of individuals’ understanding of domain content that are richer than those available from current methods used for human-machine communication in AI systems.

...read moreread less

Abstract: Chunking theory from cognitive science provides a basis for analyzing micro-behaviours in human performance in order to build models of individuals’ understanding of domain content that are richer than those available from current methods used for human-machine communication in AI systems.

...read moreread less

Proceedings Article•DOI•

Executing Declarative Parallel Representations of Sequences with Temporal Pooling

[...]

Daniel Slack¹, Alistair Knott¹, Brendan McCane¹•Institutions (1)

University of Otago¹

14 Jul 2019

TL;DR: A modified model is presented that can convert the lateral weights of the trained network into a primacy gradient usable by methods such as Competitive Queueing, thus granting the network a method for executing learned sequences.

...read moreread less

Abstract: A Self-Organising Temporal Pooling (SOTP) network has been shown to be capable of forming declarative parallel representations of sequential events and chunking these events without supervision. However, such a network currently cannot take these declarative representations and execute the associated sequence; it is strictly a one-way sequence chunker and encoder. We present a modified model that can convert the lateral weights of the trained network into a primacy gradient usable by methods such as Competitive Queueing (CQ), thus granting the network a method for executing learned sequences. The resulting model has several benefits over traditional CQ. We further present an advanced method of executing sequences via SOTP itself, resulting in less error than CQ whilst being more flexible in replaying sequences from datasets with variable sequence lengths.

...read moreread less

Recurrent Chunking Mechanisms for Conversational Machine Reading Comprehension

[...]

Hongyu Gong, Yelong Shen, Dian Yu, Jianshu Chen, Dong Yu - Show less +1 more

25 Sep 2019

TL;DR: This paper proposes to let a model learn to chunk in a more flexible way via reinforcement learning: a model can decide the next chunk that it wants to process in either direction and applies recurrent mechanisms to allow information to be transferred between chunks.

...read moreread less

Abstract: In this paper, we focus on the conversational machine reading comprehension (MRC) problem, where the input to a model could be a lengthy document and a series of interconnected questions. To deal with long inputs, previous approaches usually chunk them into equally-spaced segments and predict answers based on each chunk independently without considering the information from other chunks. As a result, they may form chunks that fail to cover complete answers or have insufficient contexts around the correct answer required for question answering. Moreover, they are less capable of answering questions that need cross-chunk information. We propose to let a model learn to chunk in a more flexible way via reinforcement learning: a model can decide the next chunk that it wants to process in either direction. We also apply recurrent mechanisms to allow information to be transferred between chunks. Experiments on two conversational MRC tasks – CoQA and QuAC – demonstrate the effectiveness of our recurrent chunking mechanisms: we can obtain chunks that are more likely to contain complete answers and at the same time provide sufficient contexts around the ground truth answers for better predictions.

...read moreread less

Patent•

Targeted chunking of data

[...]

Patwardhan Kedar, Nijasure Mangesh

18 Jun 2019

TL;DR: In this paper, the authors describe a system that identifies a length of a sliding window that a data chunking routine applies to a data buffer to create data chunks, and adjusts the expected chunk boundary based on the length of the sliding window.

...read moreread less

Abstract: Targeted chunking of data is described. A system identifies a length of a sliding window that a data chunking routine applies to a data buffer to create data chunks. The system identifies an expected chunk boundary in the data buffer. The system adjusts the expected chunk boundary, based on the length of the sliding window. The system enables the data chunking routine to start applying the sliding window at the adjusted expected chunk boundary in the data buffer instead of starting application of the sliding window at a beginning of the data buffer.

...read moreread less

Dissertation•

Designing highly-efficient deduplication systems with optimized computation and i/o operations

[...]

Fan Ni

10 May 2019

Proceedings Article•DOI•

A CNN based approach to Phrase-Labelling through classification of N-Grams

[...]

Chinmay Choudhary¹, Colm O'Riordan¹•Institutions (1)

National University of Ireland, Galway¹

12 Dec 2019

TL;DR: It is argued that basic nature of Phrase-labelling is not temporal but spatial in nature and the hypothesis that a CNN based model that directly extracts labelled n-grams from the input sentence would outperform standard RNN based model is proposed.

...read moreread less

Abstract: Modern approaches address the task of Phrase-labelling within any input sentence (eg: NER, Chunking etc.) as a variant of word-tagging problem. These approaches extract the desired phrases as word-sequences which are mapped to specific tag-sequences (with Begin, Intermediate and End tag-types).However we argue that basic nature of Phrase-labelling is not temporal but spatial in nature. Thus we propose and test the hypothesis that a CNN based model that directly extracts labelled n-grams from the input sentence would outperform standard RNN based model.

...read moreread less

Journal Article•DOI•

The Effect of Using "Chunking" on Developing EFL Literacy Skills Among Preparatory School Pupils

[...]

Fatma S. Mohamed, Mona S. Zaza, Abdellatef Elshazly Youssef

01 Oct 2019

Improving Batch Performance when Migrating to Microservices with Chunking and Coroutines

[...]

Holger Knoche

01 Nov 2019

TL;DR: In this paper, a chunking approach based on coroutines is presented to mitigate the potential penalty to batch performance during migrations to microservices, which is difficult to integrate into existing batch jobs, which are traditionally executed sequentially.

...read moreread less

Abstract: When migrating enterprise software towards microservices, batch jobs are particularly sensitive to communication overhead introduced by the distributed nature of microservices. As it is not uncommon for a single batch job to process millions of data items, even an additional millisecond of overhead per item may lead to a significant increase in runtime. A common strategy for reducing the average overhead per item is called chunking, which means that individual requests for different data items are grouped into larger requests. However, chunking is difficult to integrate into existing batch jobs, which are traditionally executed sequentially. In this paper, we present a chunking approach based on coroutines, and investigate whether it can be used to mitigate the potential penalty to batch performance during migrations to microservices.

...read moreread less