scispace - formally typeset
Open AccessJournal ArticleDOI

Machine Learning With Big Data: Challenges and Approaches

TLDR
This paper compiles, summarizes, and organizes machine learning challenges with Big Data, highlighting the cause–effect relationship by organizing challenges according to Big Data Vs or dimensions that instigated the issue: volume, velocity, variety, or veracity.
Abstract
The Big Data revolution promises to transform how we live, work, and think by enabling process optimization, empowering insight discovery and improving decision making. The realization of this grand potential relies on the ability to extract value from such massive data through data analytics; machine learning is at its core because of its ability to learn from data and provide data driven insights, decisions, and predictions. However, traditional machine learning approaches were developed in a different era, and thus are based upon multiple assumptions, such as the data set fitting entirely into memory, what unfortunately no longer holds true in this new context. These broken assumptions, together with the Big Data characteristics, are creating obstacles for the traditional techniques. Consequently, this paper compiles, summarizes, and organizes machine learning challenges with Big Data. In contrast to other research that discusses challenges, this work highlights the cause–effect relationship by organizing challenges according to Big Data Vs or dimensions that instigated the issue: volume, velocity, variety, or veracity. Moreover, emerging machine learning approaches and techniques are discussed in terms of how they are capable of handling the various challenges with the ultimate objective of helping practitioners select appropriate solutions for their use cases. Finally, a matrix relating the challenges and approaches is presented. Through this process, this paper provides a perspective on the domain, identifies research gaps and opportunities, and provides a strong foundation and encouragement for further research in the field of machine learning with Big Data.

read more

Citations
More filters
Book

Age of Information: A New Concept, Metric, and Tool

TL;DR: This monograph provides the reader with an easy-to-read tutorial-like introduction into this novel approach of dealing with information within systems and shows how the approach can be used as a tool in improving metrics in other contexts.
Journal ArticleDOI

Modeling and forecasting building energy consumption: A review of data-driven techniques

TL;DR: A review of studies developing data-driven models for building scale applications with a focus on the input data characteristics and data pre-processing methods, the building typologies considered, the targeted energy end-uses and forecasting horizons, and accuracy assessment.
Journal ArticleDOI

Machine Learning in IoT Security: Current Solutions and Future Challenges

TL;DR: This paper systematically review the security requirements, attack vectors, and the current security solutions for the IoT networks, and sheds light on the gaps in these security solutions that call for ML and DL approaches.
Journal ArticleDOI

A Survey on Security Threats and Defensive Techniques of Machine Learning: A Data Driven View

TL;DR: This paper revisits existing security threats and gives a systematic survey on them from two aspects, the training phase and the testing/inferring phase, and categorizes current defensive techniques of machine learning into four groups: security assessment mechanisms, countermeasures in theTraining phase, those in the testing or inferring phase; data security, and privacy.
References
More filters
Journal ArticleDOI

MapReduce: simplified data processing on large clusters

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Journal ArticleDOI

The WEKA data mining software: an update

TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Journal ArticleDOI

Regularization Paths for Generalized Linear Models via Coordinate Descent

TL;DR: In comparative timings, the new algorithms are considerably faster than competing methods and can handle large problems and can also deal efficiently with sparse features.
Journal ArticleDOI

Representation Learning: A Review and New Perspectives

TL;DR: Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.
Related Papers (5)
Trending Questions (1)
Why is data the foundation for machine learning?

The paper does not explicitly answer the question of why data is the foundation for machine learning. The paper primarily focuses on the challenges and approaches of machine learning with big data.