scispace - formally typeset
Search or ask a question

Showing papers on "Batch file published in 2020"


Journal ArticleDOI
TL;DR: A novel batch-file access approach, referred to as BFO for its set of optimized Batch-File Operations, by developing novel BFOr and BFOw operations for fundamental read and write processes, respectively, using a two-phase access for metadata and data jointly.
Abstract: Existing local file systems, designed to support a typical single-file access mode only, can lead to poor performance when accessing a batch of files, especially small files. This single-file mode essentially serializes accesses to batched files one by one, resulting in a large number of non-sequential, random, and often dependent I/Os between file data and metadata at the storage ends. Such access mode can further worsen the efficiency and performance of applications accessing massive files, such as data migration. We first experimentally analyze the root cause of such inefficiency in batch-file accesses. Then, we propose a novel batch-file access approach, referred to as BFO for its set of optimized Batch-File Operations, by developing novel BFOr and BFOw operations for fundamental read and write processes, respectively, using a two-phase access for metadata and data jointly. The BFO offers dedicated interfaces for batch-file accesses and additional processes integrated into existing file systems without modifying their structures and procedures. In addition, based on BFOr and BFOw, we also propose the novel batch-file migration BFOm to accelerate the data migration for massive small files. We implement a BFO prototype on ext4, one of the most popular file systems. Our evaluation results show that the batch-file read and write performances of BFO are consistently higher than those of the traditional approaches regardless of access patterns, data layouts, and storage media, under synthetic and real-world file sets. BFO improves the read performance by up to 22.4× and 1.8× with HDD and SSD, respectively, and it boosts the write performance by up to 111.4× and 2.9× with HDD and SSD, respectively. BFO also demonstrates consistent performance advantages for data migration in both local and remote situations.

3 citations


Patent
10 Apr 2020
TL;DR: In this article, a full-link data management system comprising a data source which comprises streaming data, batch file data and a database is presented, where the off-line processing platform comprises a data acquisition engine and an off-offline batch processing engine.
Abstract: The invention discloses a full-link data management system comprising a data source which comprises streaming data, batch file data and a database; the off-line processing platform comprises a data acquisition engine and an off-line batch processing engine, the data acquisition engine comprises a real-time data acquisition system and a batch acquisition system, and the off-line batch processing engine is used for realizing high-performance off-line batch processing operation; wherein the offline batch processing engine comprises a Hive, a MapReduce, a Spark SQL (Structured Query Language), a Spark, a Yarn and an HDFS (Hadoop Distributed File System); the business application is used for querying and using the business application of the batch processing result; according to the method, theuse efficiency of data under mass data can be well improved, the associated use problem of heterogeneous database data is solved, and the data permission can be well managed and controlled.

1 citations


Proceedings ArticleDOI
01 Dec 2020
TL;DR: In this paper, a Peruvian telecommunication company sends millions of SMS every day to promote their services through SMS mobile advertising, and the workflow to deliver the SMS has the following processes: data collection, data preparation and delivery.
Abstract: A Peruvian telecommunication company sends millions of SMS every day to promote their services through SMS mobile advertising. The workflow to deliver the SMS has the following processes: data collection, data preparation and delivery. Each process involves a variety of repetitive and monotonous tasks performed by the workers who have the goal of sending the SMS on time following a strategic schedule. When these tasks are performed entirely manually, they have a negative effect on the workers’ performance, generate time pressure and create a bottleneck in the processes impacting the SMS-delivery schedule. To speed up the actual workflow, we automated each process with the use of batch files and the programming language Python, whose vast library helped us connect to the automating testing suite ’selenium’. As a result, the long-monotonous processes were simplified by simple scripts where a worker only had to introduce a few inputs to accomplish their task. With this automation, we reduced the total workflow time dramatically, making the SMS-delivery workflow smooth and straightforward.

1 citations


Patent
30 Apr 2020
TL;DR: In this article, the header and trailer record validation for batch files is discussed, where a computer implemented system implements a header/trailer validation tool, which can read a control file containing pertinent information of each file to be validated.
Abstract: The invention relates to header and trailer record validation for batch files. According to an embodiment of the present invention, a computer implemented system implements a Header/Trailer Validation Tool. The Header/Trailer Validation Tool may read a control file containing pertinent information of each file to be validated. Key information may be determined at run time and the Header/Trailer Validation Tool may process the files dynamically. An embodiment of the present invention may also process any number of files in a single execution—this is particularly useful because in some cases a set of files may be received from another application, but not all the files may be used at the same time.