scispace - formally typeset
BookDOI

Statistical Foundations of Data Science

TLDR
Statistical Foundations of Data Science as discussed by the authors provides a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories.
Abstract
Statistical Foundations of Data Science gives a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories. It aims to serve as a graduate-level textbook and a research monograph on high-dimensional statistics, sparsity and covariance learning, machine learning, and statistical inference. It includes ample exercises that involve both theoretical studies as well as empirical applications. The book begins with an introduction to the stylized features of big data and their impacts on statistical analysis. It then introduces multiple linear regression and expands the techniques of model building via nonparametric regression and kernel tricks. It provides a comprehensive account on sparsity explorations and model selections for multiple regression, generalized linear models, quantile regression, robust regression, hazards regression, among others. High-dimensional inference is also thoroughly addressed and so is feature screening. The book also provides a comprehensive account on high-dimensional covariance estimation, learning latent factors and hidden structures, as well as their applications to statistical estimation, inference, prediction and machine learning problems. It also introduces thoroughly statistical machine learning theory and methods for classification, clustering, and prediction. These include CART, random forests, boosting, support vector machines, clustering algorithms, sparse PCA, and deep learning.

read more

Citations
More filters
Journal ArticleDOI

Spectral Methods for Data Science: A Statistical Perspective

TL;DR: This monograph aims to present a systematic, comprehensive, yet accessible introduction to spectral methods from a modern statistical perspective, highlighting their algorithmic implications in diverse large-scale applications.
Posted Content

Statistical Inference for High-Dimensional Matrix-Variate Factor Model

TL;DR: An estimation method called $\alpha$-PCA is proposed that preserves the matrix structure and aggregates mean and contemporary covariance through a hyper-parameter $alpha$ and compares favorably with the existing ones.
Posted Content

Benign overfitting in the large deviation regime

TL;DR: The benign overfitting phenomenon in the large deviation regime where the bounds on the prediction risk hold with probability $1-e-\zeta n$ for some absolute constant $\zeta$ is investigated and it is proved that these bounds can converge to $0$ for the quadratic loss.
Posted Content

Semiparametric Tensor Factor Analysis by Iteratively Projected SVD

TL;DR: A general framework of Semiparametric TEnsor FActor analysis (STEFA) that focuses on the methodology and theory of low-rank tensor decomposition with auxiliary covariates is introduced and several prediction methods with newly observed covariates based on the STEFA model are shown.
Journal ArticleDOI

A Comparison of Penalized Maximum Likelihood Estimation and Markov Chain Monte Carlo Techniques for Estimating Confirmatory Factor Analysis Models With Small Sample Sizes.

TL;DR: In this article, the authors distinguish different Bayesian estimators that can be used to stabilize the parameter estimates of a CFA: the mode of the joint posterior distribution obtained from penalized maximum likelihood (PML) estimation, and the mean (EAP), median (Med), or mode (MAP) of the marginal posterior distribution that are calculated by using Markov Chain Monte Carlo (MCMC) methods.
Related Papers (5)