A Review: Data Pre-Processing and Data Augmentation Techniques

doi:10.1016/j.gltp.2022.04.020

Open AccessJournal ArticleDOI

A Review: Data Pre-Processing and Data Augmentation Techniques

Kiran Maharana, +2 more

- 01 Apr 2022 -

Global transitions proceedings

- Vol. 3, Iss: 1, pp 91-99

Chats0

TLDR

In this paper , the authors provide an overview of data pre-processing in machine learning, focusing on all types of problems while building the machine learning problems and discuss flipping, rotating with slight degrees and others to augment the image data.

Abstract:

This review paper provides an overview of data pre-processing in Machine learning, focusing on all types of problems while building the machine learning problems. It deals with two significant issues in the pre-processing process (i). issues with data and (ii). Steps to follow to do data analysis with its best approach. As raw data are vulnerable to noise, corruption, missing, and inconsistent data, it is necessary to perform pre-processing steps, which is done using classification, clustering, and association and many other pre-processing techniques available. Poor data can primarily affect the accuracy and lead to false prediction, so it is necessary to improve the dataset's quality. So, data pre-processing is the best way to deal with such problems. It makes the knowledge extraction from the data set much easier with cleaning, Integration, transformation, and reduction methods. The issue with Data missing and significant differences in the variety of data always exists as the information is collected through multiple sources and from a real-world application. So, the data augmentation approach generates data for machine learning models. To decrease the dependency on training data and to improve the performance of the machine learning model. This paper discusses flipping, rotating with slight degrees and others to augment the image data and shows how to perform data augmentation methods without distorting the original data.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Data augmentation: A comprehensive survey of modern approaches

Alhassan G. Mumuni, +1 more

- 01 Nov 2022 -

Array

TL;DR: Data augmentation is the most effective way of alleviating the problem of data collection and annotation processes and consumes a lot of time and resources as mentioned in this paper , which is the main goal of data augmentation, to increase the volume, quality and diversity of training data.

...read moreread less

Journal ArticleDOI

Data augmentation for univariate time series forecasting with neural networks

Artemios-Anargyros Semenoglou, +2 more

- 01 Oct 2022 -

Pattern Recognition

TL;DR: In this article , the authors investigate nine data augmentation techniques, ranging from simple transformations and adjustments to sophisticated generative models and a novel upsampling approach, and empirically evaluate the impact of these techniques on forecasting accuracy considering both shallow and deep feed-forward neural networks.

...read moreread less

Journal ArticleDOI

Experimental Investigation of Efficiency of Worm Gears and Modeling of Power Loss through Artificial Neural Networks

Yunus Emre Karabacak, +1 more

- 01 Aug 2022 -

Measurement

TL;DR: In this article , an experimental system that can operate at different speeds and loading rates was developed for the efficiency calculations of worm gears (WGs), and measurements were made accordingly, a comprehensive efficiency analysis has been made for WGs based on power loss calculations under different operating conditions.

...read moreread less

Journal ArticleDOI

Towards automated eye cancer classification via VGG and ResNet networks using transfer learning

Daniel Fernando Santos-Bustos, +2 more

- 01 Jul 2022 -

Engineering Science and Technology, an I...

TL;DR: In this article , a convolutional neural network (CNN) with transfer learning was used to detect uveal melanoma (UM), a type of ocular cancer, achieving an improved sensitivity, precision and accuracy of 99, 98% and 99%, respectively.

...read moreread less

Journal ArticleDOI

AlexNet‐NDTL: Classification of MRI brain tumor images using modified AlexNet with deep transfer learning and Lipschitz‐based data augmentation

Sreedhar Kollem, +8 more

- 20 Mar 2023 -

International Journal of Imaging Systems...

TL;DR: In this article, the authors used Lipschitz-based data augmentation on a dataset, and the output of the augmentation model was fed into a modified AlexNet that uses network-based deep transfer learning to extract features from a dataset.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Data Preprocessing and Intelligent Data Analysis

A. Famili, +3 more

TL;DR: This paper first provides an overview of data preprocessing, focusing on problems of real world data, and details of dataPreprocessing techniques achieving each of the above mentioned objectives.

...read moreread less

Journal ArticleDOI

Dealing with noise problem in machine learning data-sets: A systematic review

Shivani Gupta, +1 more

- 01 Jan 2019 -

Procedia Computer Science

TL;DR: Among noise identification schemes, the accuracy of identification of noisy instances by using ensemble-based techniques are better than other techniques, but regarding efficiency, usually single based techniques method is better; it is more suitable for noisy data sets.

...read moreread less

Journal ArticleDOI

Land-Use and Land-Cover Classification Using a Human Group-Based Particle Swarm Optimization Algorithm with an LSTM Classifier on Hybrid Pre-Processing Remote-Sensing Images

R. Ganesh Babu, +4 more

- 17 Dec 2020 -

Remote Sensing

TL;DR: A hybrid feature optimization algorithm along with a deep learning classifier is proposed to improve performance of LULC classification, helping to predict wildlife habitat, deteriorating environmental quality, haphazard, etc.

...read moreread less

Journal ArticleDOI

Deep Learning-Embedded Social Internet of Things for Ambiguity-Aware Social Recommendations

Zhiwei Guo, +4 more

- 05 Jan 2021 -

IEEE Transactions on Network Science and...

TL;DR: A deep learning-embedded social Internet of Things (IoT) architecture is developed for social computing scenarios to guarantee reliable data management and overcomes the preference ambiguity problem in SR.

...read moreread less

Journal ArticleDOI

Fine-tuned support vector regression model for stock predictions

R. K. Dash, +3 more

- 15 Mar 2021 -

Neural Computing and Applications

TL;DR: In this paper, a new machine learning (ML) technique is proposed that uses the fine-tuned version of support vector regression for stock forecasting of time series data, and Grid search technique is applied over training dataset to select the best kernel function and to optimize its parameters.

...read moreread less

Related Papers (5)

Missing data exploration in air quality data set using R-package data visualisation tools

Shamihah Muhammad Ghazali, +2 more

- 01 Apr 2020 -

Bulletin of Electrical Engineering and I...

Enhancing Data Quality at ETL Stage of Data Warehousing

Neha Gupta, +1 more

- 01 Jan 2021 -

International Journal of Data Warehousin...

Research on Wine Analysis Based on Data Preprocessing

Xinfei Meng, +5 more

A Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery From Building Operational Data

Cheng Fan, +4 more

- 29 Mar 2021 -

Frontiers in Energy Research

Efficient data pre-processing for data mining using neural networks

G. B. Nagar

A Review: Data Pre-Processing and Data Augmentation Techniques

Citations

Data augmentation: A comprehensive survey of modern approaches

Data augmentation for univariate time series forecasting with neural networks

Experimental Investigation of Efficiency of Worm Gears and Modeling of Power Loss through Artificial Neural Networks

Towards automated eye cancer classification via VGG and ResNet networks using transfer learning

AlexNet‐NDTL: Classification of MRI brain tumor images using modified AlexNet with deep transfer learning and Lipschitz‐based data augmentation

References

Data Preprocessing and Intelligent Data Analysis

Dealing with noise problem in machine learning data-sets: A systematic review

Land-Use and Land-Cover Classification Using a Human Group-Based Particle Swarm Optimization Algorithm with an LSTM Classifier on Hybrid Pre-Processing Remote-Sensing Images

Deep Learning-Embedded Social Internet of Things for Ambiguity-Aware Social Recommendations

Fine-tuned support vector regression model for stock predictions

Related Papers (5)

Missing data exploration in air quality data set using R-package data visualisation tools

Enhancing Data Quality at ETL Stage of Data Warehousing

Research on Wine Analysis Based on Data Preprocessing

A Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery From Building Operational Data

Efficient data pre-processing for data mining using neural networks

Trending Questions (1)