scispace - formally typeset
Open AccessJournal ArticleDOI

Challenges of Big Data analysis

Reads0
Chats0
TLDR
In this paper, the authors provide an overview of the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures, and provide various new perspectives on the Big Data analysis and computation.
Abstract
Big Data bring new opportunities to modern society and challenges to data scientists. On the one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This paper gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogenous assumptions in most statistical methods for Big Data cannot be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.

read more

Citations
More filters
Journal ArticleDOI

Beyond the hype

TL;DR: The need to develop appropriate and efficient analytical methods to leverage massive volumes of heterogeneous data in unstructured text, audio, and video formats is highlighted and the need to devise new tools for predictive analytics for structured big data is reinforced.
Journal ArticleDOI

Majorization-Minimization Algorithms in Signal Processing, Communications, and Machine Learning

TL;DR: An overview of the majorization-minimization (MM) algorithmic framework, which can provide guidance in deriving problem-driven algorithms with low computational cost and is elaborated by a wide range of applications in signal processing, communications, and machine learning.
Journal ArticleDOI

Big data analytics in logistics and supply chain management: Certain investigations for research and applications

TL;DR: In this article, the authors classify the literature on the application of big data business analytics (BDBA) on logistics and supply chain management (LSCM) based on the nature of analytics (descriptive, predictive, prescriptive) and the focus of the LSCM (strategy and operations).
Journal ArticleDOI

Machine Learning With Big Data: Challenges and Approaches

TL;DR: This paper compiles, summarizes, and organizes machine learning challenges with Big Data, highlighting the cause–effect relationship by organizing challenges according to Big Data Vs or dimensions that instigated the issue: volume, velocity, variety, or veracity.
Journal ArticleDOI

On big data, artificial intelligence and smart cities

TL;DR: This paper reviews the urban potential of AI and proposes a new framework binding AI technology and cities while ensuring the integration of key dimensions of Culture, Metabolism and Governance which are known to be primordial in the successful integration of Smart Cities for the compliance to the Sustainable Development Goal 11 and the New Urban Agenda.
References
More filters
Journal ArticleDOI

Controlling the false discovery rate: a practical and powerful approach to multiple testing

TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.
Journal ArticleDOI

A new look at the statistical model identification

TL;DR: In this article, a new estimate minimum information theoretical criterion estimate (MAICE) is introduced for the purpose of statistical identification, which is free from the ambiguities inherent in the application of conventional hypothesis testing procedure.
Journal ArticleDOI

Regression Shrinkage and Selection via the Lasso

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Book

Matrix computations

Gene H. Golub
Book

Convex Optimization

TL;DR: In this article, the focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them, and a comprehensive introduction to the subject is given. But the focus of this book is not on the optimization problem itself, but on the problem of finding the appropriate technique to solve it.
Related Papers (5)
Trending Questions (1)
Challenges of Big data?

The paper discusses the challenges of Big Data analysis, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors.