scispace - formally typeset
Search or ask a question

Showing papers on "Chunking (computing) published in 2021"


Journal ArticleDOI
10 Feb 2021
TL;DR: In this paper, the authors investigated the effectiveness of students' memorization of textual information with 50 high school students and found that a single type of information (either all alphabets or all numbers) was easier to recall than the combined information.
Abstract: The effectiveness of students’ memorization of textual information was investigated in this study with 50 high school students. The information was presented to the participants in three different types: 10 distinct alphabets, 10 distinct numbers, and a combination of 5 distinct alphabets and 5 distinct numbers. This information was divided into three different chunking methods: One-Chunk where the whole information was told all at once, Two-Chunks where the information was divided into 5 and 5, and Three-Chunks where the information was delivered in 3-3-4, 4-3-3, and 3-4-3 fashions. The statistical results revealed that a single type of information (either all alphabets or all numbers) was found to be easier to recall than the combined information. Furthermore, dividing the information into two or three chunks was found to enhance human memorization more significantly. In addition, the study showed that when a combined type of information was shown, grouping the information into two chunks was more effective to enhance short-term memory than providing it in one chunk. Educational implications can be drawn from this study that in order to assist students to memorize and retain learning materials more effectively, it is essential to help classify them into 2-3 groups when being delivered. Also, learning should emphasize more on how to help students learn to take in information more effectively by themselves through the use of tree thinking, binary thinking, and computational thinking.

8 citations


Journal ArticleDOI
29 Apr 2021
TL;DR: Various content defined chunking algorithms and their performance based on chunking properties like chunking speed, processing time, and throughput are discussed.
Abstract: Data deduplication works on eliminating redundant data and reducing storage consumption. Nowadays more data generated and it was stored in the cloud repeatedly, due to this large volume of storage will be consumed. Data deduplication tries to reduce data volumes disk space and network bandwidth can be to reduce costs and energy consumption for running storage systems. In the data deduplication method, data broken into small size of chunk or block. Hash ID will be calculated for all the blocks then it’s compared with existing blocks for duplication. Blocks may be fixed or variable size, compared with a fixed size of block variable size chunking gives a better result. So the chunking process is the initial task of deduplication to get an optimal result. In this paper, we discussed various content defined chunking algorithms and their performance based on chunking properties like chunking speed, processing time, and throughput.

7 citations


Journal ArticleDOI
TL;DR: The aim of this paper is to fill the gap between the need and the actual trend in bio-inspired optimization by considering particle swarm optimization (PSO), and to present a novel bridge between these two notions adapted to the problem of feature selection.
Abstract: Bio-inspired optimization aims at adapting observed natural behavioral patterns and social phenomena towards efficiently solving complex optimization problems, and is nowadays gaining much attention. However, researchers recently highlighted an inconsistency between the need in the field and the actual trend. Indeed, while nowadays it is important to design innovative contributions, an actual trend in bio-inspired optimization is to re-iterate the existing knowledge in a different form. The aim of this paper is to fill this gap. More precisely, we start first by highlighting new examples for this problem by considering and describing the concepts of chunking and cooperative learning. Second, by considering particle swarm optimization (PSO), we present a novel bridge between these two notions adapted to the problem of feature selection. In the experiments, we investigate the practical importance of our approach while exploring both its strength and limitations. The results indicate that the approach is mainly suitable for large datasets, and that further research is needed to improve the computational efficiency of the approach and to ensure the independence of the sub-problems defined using chunking.

6 citations


Journal ArticleDOI
TL;DR: In this article, the authors employ a grid-navigation task as an exemplar of internally guided sequencing to investigate practice-driven performance improvements due to motor chunking, where the sequence of motor actions is self-generated or internally specified.
Abstract: Motor skill learning involves the acquisition of sequential motor movements with practice. Studies have shown that we learn to execute these sequences efficiently by chaining several elementary actions in sub-sequences called motor chunks. Several experimental paradigms, such as serial reaction task, discrete sequence production, and m × n task, have investigated motor chunking in externally specified sequencing where the environment or task paradigm provides the sequence of stimuli, i.e., the responses are stimulus driven. In this study, we examine motor chunking in a class of more realistic motor tasks that involve internally guided sequencing where the sequence of motor actions is self-generated or internally specified. We employ a grid-navigation task as an exemplar of internally guided sequencing to investigate practice-driven performance improvements due to motor chunking. The participants performed the grid-sailing task (GST) (Fermin et al., 2010), which required navigating (by executing sequential keypresses) a 10 × 10 grid from start to goal position while using a particular type of key mapping between the three cursor movement directions and the three keyboard buttons. We provide empirical evidence for motor chunking in grid-navigation tasks by showing the emergence of subject-specific, unique temporal patterns in response times. Our findings show spontaneous chunking without pre-specified or externally guided structures while replicating the earlier results with a less constrained, internally guided sequencing paradigm.

5 citations


Journal ArticleDOI
TL;DR: This article found that chunking ability was a significant modulator of general L2 processing efficiency, and of cross-language differences in particular, and added clarity to the interpretation of variability in the online reading performance of non-native speakers.
Abstract: Behavioral studies on language processing rely on the eye-mind assumption, which states that the time spent looking at text is an index of the time spent processing it. In most cases, relatively shorter reading times are interpreted as evidence of greater processing efficiency. However, previous evidence from L2 research indicates that non-native participants who present fast reading times are not always more efficient readers, but rather shallow parsers. Because earlier studies did not identify a reliable predictor of variability in L2 processing, such uncertainty around the interpretation of reading times introduces a potential confound that undermines the credibility and the conclusions of online measures of processing. The present study proposes that a recently developed modulator of online processing efficiency, namely, chunking ability, may account for the observed variability in L2 online reading performance. L1 English - L2 Spanish learners' eye movements were analyzed during natural reading. Chunking ability was predictive of overall reading speed. Target relative clauses contained L2 Verb-Noun multiword units, which were manipulated with regards to their L1-L2 congruency. The results indicated that processing of the L1-L2 incongruent units was modulated by an interaction of L2 chunking ability and level of knowledge of multiword units. Critically, the data revealed an inverse U-shaped pattern, with faster reading times in both learners with the highest and the lowest chunking ability scores, suggesting fast integration in the former, and lack of integration in the latter. Additionally, the presence of significant differences between conditions was correlated with individual chunking ability. The findings point at chunking ability as a significant modulator of general L2 processing efficiency, and of cross-language differences in particular, and add clarity to the interpretation of variability in the online reading performance of non-native speakers.

4 citations


Book ChapterDOI
07 Jan 2021
TL;DR: This paper proposed ReLink, a novel Open IE system for extracting binary relations from open-domain text, which is based on identifying correct phrases and linking them in the most proper way to reflect their relationship in a sentence.
Abstract: Recently, many Open IE systems have been developed based on using deep linguistic features such as dependency-parse features to overcome the limitations presented in older Open IE systems which use only shallow information like part-of-speech or chunking. Even though these newer systems have some clear advantages in their extractions, they also possess several issues which do not exist in old systems. In this paper, we analyze the outputs from several popular Open IE systems to find out their strength and weaknesses. Then we introduce ReLink, a novel Open IE system for extracting binary relations from open-domain text. Its working model is based on identifying correct phrases and linking them in the most proper way to reflect their relationship in a sentence. After establishing connections, it can easily extract relations by using several pre-defined patterns. Despite using only shallow linguistic features for extraction, it does not have the same weakness that existed in older systems, and it can also avoid many similar issues arising in recent Open IE systems. Our experiments show that ReLink achieves larger Area Under Precision-Recall Curve compared with ReVerb and Ollie, two well-known Open IE systems.

2 citations


Book ChapterDOI
01 Jan 2021
TL;DR: In this article, the authors explore the fundamental data DeDup techniques using open-source programming tools and compare fixed-size and variable-size chunking techniques for a publicly available dataset.
Abstract: The emergence of 5G architecture and global increase in the penetration of the Internet have led to an enormous amount of data generation from several ranges of applications such as cloud computing, telehealth, mail transactions, and backups. One of the immediate consequences of such big data is the creation of duplicate records that results in the redundancy of data stored and unnecessary computation. Data deduplication (DeDup) is a process of removing duplicates from the targeted dataset such that optimal storage is achieved. In this chapter, we intend to explore the fundamentals of the data DeDup techniques using open-source programming tools. Suitable examples are presented to demonstrate the basic chunking, hashing, and comparison processes. The DeDup is performed and the comparison is done for the fixed-size and variable-size chunking techniques. Further, the open-source record linkage toolkit that is equipped with standard record linkage and DeDup tools is introduced. This includes various functionalities provided by the toolkit such as preprocessing, indexing, comparing, and link matching. The DeDup using the threshold, supervised learning, and unsupervised learning-based matching techniques are also demonstrated for a publicly available dataset. Finally, the conclusions and authors’ perspective over data DeDup techniques are drawn.

1 citations


DOI
23 Jul 2021
TL;DR: The root-based and chunking approach is not a newly invented ESL (English as a Second Language) teaching approach, but it has been widely used by a majority of ESL teachers ever since TESOL courses were acknowledged and adopted by the bulk of non-English speaking countries to train their own ESL teachers.
Abstract: The root-based and chunking approach is not a newly invented ESL (English as a Second Language) teaching approach. Technically, it has been widely used by a majority of ESL teachers ever since TESOL (Teaching English to Speakers of Other Languages) courses were acknowledged and adopted by the bulk of non-English speaking countries to train their own ESL teachers. While an increasing number of ESL teachers are aware of the value of learning roots and chunks, the misuse of root-based and chunking approach emerges and lead to some adverse effects. Before this paper was written, a comparative study had been done in one of my courses, during which I noticed that their reactions towards the words’ affixes, roots and synonyms are obviously different. Hence this paper was written mainly to analyse the causes of these differences, trying to improve the traditional root-based and chunking approach using the theory of multiple intelligence.

1 citations


Proceedings ArticleDOI
19 May 2021
TL;DR: In this article, a modified version of the "chunking" method for HTTP/2(3) has been implemented in the software library and tested on the subject of compatibility with the existing HTTP client libraries and web-browsers.
Abstract: During transmission of the data, it can be intercepted and modified. Such behavior is called the MITM (Man In The Middle) attack. Despite the usage of encryption in public networks, such attacks may happen because of the botnets. They pose a particular threat to local networks, where the data is usually transmitted in an unencrypted form. During the previous studies, the “chunking” model and method have been developed for payload authentication. They have been proven to have increased performance in comparison to the widely used methods, such as “Buffering to File” or “Buffering to Memory”. The support of the ECDSA algorithm has been added to the “chunking” method. The most widespread protocol for data transmission is HTTP, which is used in a wide variety of applications. Previously, the “Chunking” method was implemented for every version of the HTTP protocol. However, because of the structure of the HTTP/1 protocol, not all the benefits of the “chunking” method could be used. The new model of payload authentication for HTTP/2 and HTTP/3 protocols has been developed. The modified method for HTTP/2(3) has been implemented in the software library. The new library has been further tested on the subject of compatibility with the existing HTTP/2 client libraries and web-browsers. The new method may be used in web browsers, but modification of the source code is required.

1 citations


Journal ArticleDOI
01 Jan 2021
TL;DR: It was showed that most EFL students fully understand the importance of vocabulary, but it was very difficult to use them in sentences, which shows that the spoken language module should be enriched taking into account grammar and usage in teaching vocabulary and used for more EFL learners.
Abstract: In this article it is examined how Chunking Language is perceived by online EFL students, and analyzed how this strategy enriches students’ vocabulary based on their ideas and experiences in online classes. The study used a qualitative research methodology through thematic analysis, where descriptive analysis was performed on the responses. Interview guide questions were used to explore participants’ insights and ideas on how to use the strategy. The findings showed that most EFL students fully understand the importance of vocabulary, but it was very difficult to use them in sentences. In addition, participants adopted Chunking Language as a life strategy in the module, which is important in language learning and an important approach in vocabulary literacy. As a strategy, it helps students learn new vocabulary and use it in appropriate contexts. Research shows that the spoken language module should be enriched taking into account grammar and usage in teaching vocabulary and used for more EFL learners.

1 citations


Journal ArticleDOI
TL;DR: In this paper, the authors created an annotated corpus for POS tagging and chunking in Bhojpuri, Maithili and Magahi languages and then built an initial automatic tool for these problems.

Journal ArticleDOI
30 Aug 2021
TL;DR: G grammar-based annotation was useful to solve domain-based chunks in input query sentences without any manual annotation where it was found to achieve a classification F1 score of 96.97% in classifying the tokens for the out of template queries.
Abstract: Chunking means splitting the sentences into tokens and then grouping them in a meaningful way. When it comes to high-performance chunking systems, transformer models have proved to be the state of the art benchmarks. To perform chunking as a task it requires a large-scale high quality annotated corpus where each token is attached with a particular tag similar as that of Named Entity Recognition Tasks. Later these tags are used in conjunction with pointer frameworks to find the final chunk. To solve this for a specific domain problem, it becomes a highly costly affair in terms of time and resources to manually annotate and produce a large-high-quality training set. When the domain is specific and diverse, then cold starting becomes even more difficult because of the expected large number of manually annotated queries to cover all aspects. To overcome the problem, we applied a grammar-based text generation mechanism where instead of annotating a sentence we annotate using grammar templates. We defined various templates corresponding to different grammar rules. To create a sentence we used these templates along with the rules where symbol or terminal values were chosen from the domain data catalog. It helped us to create a large number of annotated queries. These annotated queries were used for training the machine learning model using an ensemble transformer-based deep neural network model [24.] We found that grammar-based annotation was useful to solve domain-based chunks in input query sentences without any manual annotation where it was found to achieve a classification F1 score of 96.97% in classifying the tokens for the out of template queries.

Book ChapterDOI
01 Jan 2021
TL;DR: Data deduplication refers to size reduction of data by eliminating data redundancy due to duplication as discussed by the authors, which can be used to reduce the required storage space and may result into different adjoining benefits.
Abstract: Data deduplication refers to size reduction of data by eliminating data redundancy due to duplication. Possibility of duplication is high when size of data is huge. As the data especially digital data is growing drastically on the Internet due to emerging online ways of communication and interaction in various areas such as social media, banking, and marketing, the problem of duplicate data has become serious. There are various data deduplication techniques that can be used to reduce its size. Apart from reducing the required storage space this reduction may result into different adjoining benefits. For example, it saves device cost and time required for backup and archive when data is to be stored on secondary storage. In case of primary storage, it eliminates duplicate disk I/Os and thus reduce the time of program execution. When data is meant for cloud storage, deduplication reduces time for data uploading on WAN. When data is to be stored on virtual machine, it saves time for its migration. When data is on network, its size reduction reduces time of transmission and reduces redundancy for WAN optimization. Different techniques for data deduplication may be categorized based on various aspects of algorithm design and scenarios it is designed for such as location, time, unit of deduplication, and so on. Based on location, they can be categorized as source or client depending on whether this operation is performed on client side or server side during communication of data. Whereas, based on time it depends on whether the process is performed while the data was being stored or after the data was stored. The categorization based on unit of deduplication is popularly done as file chunking, fixed block chunking and variable block chunking. Data deduplication consists of primarily identification of the duplicate data elements and replacing them by links such that all such duplicate data elements may be located physically at single position on physical media. Various steps involved in the process of data deduplication are chunking and fingerprinting, hashing, indexing, and compression.

Journal ArticleDOI
TL;DR: Although Pólya’s four-step plan is a well-known problem-solving framework, my students benefited from a more concrete and detailed approach: chunking the reading.
Abstract: Reading mathematics problems can frustrate students to the point of shutting down. Although Pólya’s four-step plan is a well-known problem-solving framework, my students benefited from a more concrete and detailed approach: chunking the reading.

Journal ArticleDOI
TL;DR: In this article, the authors investigated whether schematic chunking is effective in imprecise long-term memory (LSTM) applications. And they found that it is effective.
Abstract: Schematic chunks denote patterns, schemes, or sophisticated rules and knowledge stored in the long-term memory in the form of chunks. We investigated whether schematic chunking is effective in impr...

Book ChapterDOI
01 Jan 2021
TL;DR: This chapter discussed two case studies, one for Deduplicator and another for Duplicates Cleaner, and the implementation of file chunking is elaborated in detail.
Abstract: The present chapter begins with the introduction of the concept of how data is stored in a different file format, which may contain duplicate data and the concept of file chunking approach. File chunking has been explained with the concept of chunk discrimination, duplicate detection, and consolidation. The implementation of file chunking is elaborated in detail. This chapter also discussed two case studies, one for Deduplicator and another for Duplicates Cleaner, followed by a conclusion. For the reader’s sake of ease, the author has also provided the details in the form of a bibliographic note and even shared some supporting GitHub repositories and blogs link for more low-level implementation of file chunking approach.


Journal ArticleDOI
TL;DR: The linguistic work on annotation guidelines development, manual corpus annotation, and preparing the neural models used for chunking - the first one for the Polish language, and the evaluation of these models are described.


Proceedings ArticleDOI
19 Jun 2021
TL;DR: This paper proposes an Ensemble Model using a fine-tuned Transformer Model and a recurrent neural network model together to predict tags and chunk substructures of a sentence.
Abstract: Transformer Models have taken over most of the Natural language Inference tasks. In recent times they have proved to beat several benchmarks. Chunking means splitting the sentences into tokens and then grouping them in a meaningful way. Chunking is a task that has gradually moved from POS tag-based statistical models to neural nets using Language models such as LSTM, Bidirectional LSTMs, attention models, etc. Deep neural net Models are deployed indirectly for classifying tokens as different tags defined under Named Recognition Tasks. Later these tags are used in conjunction with pointer frameworks for the final chunking task. In our paper, we propose an Ensemble Model using a fine-tuned Transformer Model and a recurrent neural network model together to predict tags and chunk substructures of a sentence. We analyzed the shortcomings of the transformer models in predicting different tags and then trained the BILSTM+CNN accordingly to compensate for the same.

Proceedings Article
01 Nov 2021
Abstract: In this paper, we address unsupervised chunking as a new task of syntactic structure induction, which is helpful for understanding the linguistic structures of human languages as well as processing low-resource languages. We propose a knowledge-transfer approach that heuristically induces chunk labels from state-of-the-art unsupervised parsing models; a hierarchical recurrent neural network (HRNN) learns from such induced chunk labels to smooth out the noise of the heuristics. Experiments show that our approach largely bridges the gap between supervised and unsupervised chunking.

Posted Content
TL;DR: This article presented a review of POS tagging technologies for part-of-speech (POS) disambiguation for any new language, which could be either corpus-based or non-corpus-based.
Abstract: Developing an automatic part-of-speech (POS) tagging for any new language is considered a necessary step for further computational linguistics methodology beyond tagging, like chunking and parsing, to be fully applied to the language. Many POS disambiguation technologies have been developed for this type of research and there are factors that influence the choice of choosing one. This could be either corpus-based or non-corpus-based. In this paper, we present a review of POS tagging technologies.

Journal ArticleDOI
TL;DR: In this article, the authors tested characteristics of melodic melodic memory for chunking in musical memory and found that chunking is defined as information compression by means of encoding meaningful units.
Abstract: Chunking is defined as information compression by means of encoding meaningful units. To advance the understanding of chunking in musical memory, the present study tested characteristics of melodic...


Posted ContentDOI
30 May 2021-bioRxiv
TL;DR: In this paper, a weighted graph model was developed to study the conditions for the evolution of chunking ability, based on the ecology of the cleaner fish Labroides dimidiatus, and it was shown that chaining is the minimal requirement for solving the laboratory task, that involves repeated simultaneous exposure to an ephemeral and permanent food source.
Abstract: What makes cognition ‘advanced’ is an open and not precisely defined question. One perspective involves increasing the complexity of associative learning, from conditioning to learning sequences of events (‘chaining’) to representing various cue combinations as ‘chunks’. Here we develop a weighted-graph model to study the conditions for the evolution of chunking ability, based on the ecology of the cleaner fish Labroides dimidiatus. Cleaners must learn to serve visitor clients before resident clients, because a visitor leaves if not attended while a resident waits for service. This challenge has been captured in various versions of the ephemeral-reward task, which has been proven difficult for a range of cognitively capable species. We show that chaining is the minimal requirement for solving the laboratory task, that involves repeated simultaneous exposure to an ephemeral and permanent food source. Adding ephemeral-ephemeral and permanent-permanent combinations, as cleaners face in the wild, requires individuals to have chunking abilities to solve the task. Importantly, chunking parameters need to be calibrated to ecological conditions in order to produce adaptive decisions. Thus, it is the fine tuning of this ability which may be the major target of selection during the evolution of advanced associative learning.