scispace - formally typeset
Search or ask a question
Book ChapterDOI

Storage Optimization Using File Compression Techniques for Big Data

TL;DR: In this article, a file compression system for big data as system utility software, and the users would also be able to use it on the desktop and lossless compression takes place in this work.
Abstract: The world is surrounded by technology. There are lots of devices everywhere around us. It is impossible to imagine our lives without technology, as we have got dependent on it for most of our work. One of the primary functions for which we use technology or computers especially is to store and transfer data from a host system or network to another one having similar credentials. The restriction in the capacity of computers means that there’s restriction on amount of data which can be stored or has to transport. So, in order to tackle this problem, computer scientists came up with data compression algorithms. A file compression system’s objective is to build an efficient software which can help to reduce the size of user files to smaller bytes so that it can easily be transferred over a slower Internet connection and it takes less space on the disk. Data compression or the diminishing of rate of bit includes encoding data utilizing less number of bits as compared to the first portrayal. Compression can be of two writes lossless and lossy. The first one decreases bits by recognizing and disposing of measurable excesses, and due to this reason, no data is lost or every info is retained. The latter type lessens record estimate by expelling pointless or less vital data. This paper proposed a file compression system for big data as system utility software, and the users would also be able to use it on the desktop and lossless compression takes place in this work.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper , the authors explore the various dimensions of datafication, including technologies, practices, and challenges involved in turning information into structured data for analysis and decision-making.
Abstract: Datafication has emerged as a key driver of the digital economy, enabling businesses, governments, and individuals to extract value from the growing flood of data. In this comprehensive survey, we explore the various dimensions of datafication, including the technologies, practices, and challenges involved in turning information into structured data for analysis and decision-making. We begin by providing an overview of the historical context and the rise of big data, and then delve into the latest developments in artificial intelligence and machine learning. We examine the key drivers of datafication across industries and sectors, and explore the ethical, legal, and social implications of the data revolution. Finally, we consider the challenges and opportunities presented by datafication, including issues of data privacy and security, the need for new skills and competencies, and the potential for data to drive innovation and social change. Overall, this survey provides a comprehensive and up-to-date overview of the datafication landscape, helping readers to better understand and navigate the rapidly-evolving world of data.

2 citations

Journal ArticleDOI
TL;DR: In this paper , the authors use unstructured knowledge (i.e., information like text, images, music, video, and social media posts) to increase the percentage of returning customers.
Abstract: Information technology today is increasingly concerned with dealing with massive data sets. The proliferation of the internet and, by extension, the digital economy has resulted in a meteoric rise in the need for data storage and analysis. This creates a serious problem for American IT departments in terms of securing and analysing the resulting avalanche of data. Businesses currently acquire and store more data than ever before due to the critical role that information plays in their daily operations. In all likelihood, this pattern will maintain its current trajectory. The organised knowledge that is being developed right now is based on a lot of legacy information. Instead, it's information like text, images, music, video, and social media posts. It's called "unstructured knowledge" when the knowledge isn't in any particular shape. The term "big data analytics" refers to a technique that can be used to get insight from these massive datasets. In addition to generating new business prospects, this strategy has been shown to increase the percentage of returning customers.
References
More filters
01 Dec 2010
TL;DR: An experimental comparison of a number of different lossless data compression algorithms is presented and it is stated which algorithm performs well for text data.
Abstract: Data compression is a common requirement for most of the computerized applications. There are number of data compression algorithms, which are dedicated to compress different data formats. Even for a single data type there are number of different compression algorithms, which use different approaches. This paper examines lossless data compression algorithms and compares their performance. A set of selected algorithms are examined and implemented to evaluate the performance in compressing text data. An experimental comparison of a number of different lossless data compression algorithms is presented in this paper. The article is concluded by stating which algorithm performs well for text data.

120 citations

Proceedings ArticleDOI
24 Mar 1992
TL;DR: The authors consider choosing Sigma to be an alphabet whose symbols are the words of English or, in general, alternate maximal strings of alphanumeric characters and nonalphanumeric characters to take advantage of longer-range correlations between words and achieve better compression.
Abstract: Text compression algorithms are normally defined in terms of a source alphabet Sigma of 8-bit ASCII codes. The authors consider choosing Sigma to be an alphabet whose symbols are the words of English or, in general, alternate maximal strings of alphanumeric characters and nonalphanumeric characters. The compression algorithm would be able to take advantage of longer-range correlations between words and thus achieve better compression. The large size of Sigma leads to some implementation problems, but these are overcome to construct word-based LZW, word-based adaptive Huffman, and word-based context modelling compression algorithms. >

91 citations

01 Jan 2011
TL;DR: A survey of different basic lossless data compression algorithms using Statistical compression techniques and Dictionary based compression techniques on text data is provided.
Abstract: Data Compression is the science and art of representing information in a compact form. For decades, Data compression has been one of the critical enabling technologies for the ongoing digital multimedia revolution. There are lot of data compression algorithms which are available to compress files of different formats. This paper provides a survey of different basic lossless data compression algorithms. Experimental results and comparisons of the lossless compression algorithms using Statistical compression techniques and Dictionary based compression techniques were performed on text data. Among the statistical coding techniques the algorithms such as Shannon-Fano Coding, Huffman coding, Adaptive Huffman coding, Run Length Encoding and Arithmetic coding are considered. Lempel Ziv scheme which is a dictionary based technique is divided into two families: those derived from LZ77 (LZ77, LZSS, LZH and LZB) and those derived from LZ78 (LZ78, LZW and LZFG). A set of interesting conclusions are derived on their basis.

80 citations

Proceedings ArticleDOI
R Vaishali1, R. Sasikala1, Somula Ramasubbareddy1, S Remya1, Sravani Nalluri1 
01 Oct 2017
TL;DR: This research work aims to improve the accuracy of existing diagnostic methods for the prediction of Type 2 Diabetes with machine learning algorithms.
Abstract: Diabetes Mellitus is a dreadful disease characterized by increased levels of glucose in the blood, termed as the condition of hyperglycemia. As this disease is prominent among the tropical countries like India, an intense research is being carried out to deliver a machine learning model that could learn from previous patient records in order to deliver smart diagnosis. This research work aims to improve the accuracy of existing diagnostic methods for the prediction of Type 2 Diabetes with machine learning algorithms. The proposed algorithm selects the essential features from the Pima Indians Diabetes Dataset with Goldberg's Genetic algorithm in the pre-processing stage and a Multi Objective Evolutionary Fuzzy Classifier is applied on the dataset. This algorithm works on the principle of maximum classifier rate and minimum rules. As a result of feature selection with GA the number of features is reduced to 4 from 8 and the classifier rate is improved to 83.0435 % with NSGA II in training rate of 70% and 30% testing.

58 citations

Journal ArticleDOI
TL;DR: This paper deals with measuring the Air Quality using Mq135 sensor along with Carbon Monoxide CO using MQ7 sensor using Machine Learning analysis and proviing a reducement of the cost of components versus the state of the art.
Abstract: This paper deals with measuring the Air Quality using MQ135 sensor along with Carbon Monoxide CO using MQ7 sensor. Measuring Air Quality is an important element for bringing awareness to take care of the future generations and for a healthier life. Based on this, Government of India has already taken certain measures to ban Single Stroke and Two Stroke Engine based motorcycles which are emitting high pollution. We are trying to implement a system using IoT platforms like Thingspeak or Cayenne in order to bring awareness to every individual about the harm we are doing to our environment. Already, New Delhi is remarked as the most pollution city in the world recording Air Quality above 300 PPM. We have used easiest platform like Thingspeak and set the dashboard to public such that everyone can come to know the Air Quality at the location where the system is installed. Machine Learning analysis brings us a lot of depth in understanding the information that we obtained from the data. Moreover, we are proviing a reducement of the cost of components versus the state of the art.

42 citations