scispace - formally typeset
Open AccessJournal ArticleDOI

Optimal Subsampling Algorithms for Big Data Regressions

Reads0
Chats0
TLDR
In this article, the optimal subsampling method under the A-optimality Criterion (OSMAC) for generalized linear models was proposed, and asymptotic normality and optimality of the estimator from this adaptive algorithm were established.
Abstract
To fast approximate maximum likelihood estimators with massive data, this paper studies the Optimal Subsampling Method under the A-optimality Criterion (OSMAC) for generalized linear models. The consistency and asymptotic normality of the estimator from a general subsampling algorithm are established, and optimal subsampling probabilities under the A- and L-optimality criteria are derived. Furthermore, using Frobenius norm matrix concentration inequalities, finite sample properties of the subsample estimator based on optimal subsampling probabilities are also derived. Since the optimal subsampling probabilities depend on the full data estimate, an adaptive two-step algorithm is developed. Asymptotic normality and optimality of the estimator from this adaptive algorithm are established. The proposed methods are illustrated and evaluated through numerical experiments on simulated and real datasets.

read more

Citations
More filters
Posted Content

Optimal subsampling for quantile regression in big data

TL;DR: In this article, optimal subsampling for quantile regression is investigated and algorithms based on the optimal sampling probabilities are proposed to obtain asymptotic distributions and optimality of the resulting estimators.
Journal ArticleDOI

A Review on Optimal Subsampling Methods for Massive Datasets

TL;DR: The optimal subsampling methods have been investigated to include logistic regression models, softmax regressors, generalized linear models, quantile 12 regression Models, and quasi-likelihood estimation.
Journal ArticleDOI

Distributed subdata selection for big data via sampling-based approach

TL;DR: A distributed subdata selection method for big data linear regression model is proposed and a two-step subsampling strategy with optimal subsampled probabilities and optimal allocation sizes is developed, which effectively approximates the ordinary least squares estimator from the full data.
Posted Content

Optimal Sampling for Generalized Linear Models under Measurement Constraints

TL;DR: In this article, the authors proposed a response-free sampling procedure for generalized linear models (GLMs) using the A-optimality criterion, i.e., the trace of the asymptotic variance, the resultant estimator is statistically efficient within a class of sampling estimators.
Journal ArticleDOI

Optimal Sampling for Generalized Linear Models Under Measurement Constraints

TL;DR: In this article, the covariates are available for the entire dataset and responses are expensive to measure and initially unavailable on most of records in the dataset, but the covariate coefficients can be found for the whole dataset.
References
More filters
Journal Article

R: A language and environment for statistical computing.

R Core Team
- 01 Jan 2014 - 
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Book

Generalized Linear Models

TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).
Journal ArticleDOI

Generalized Linear Models

Eric R. Ziegel
- 01 Aug 2002 - 
TL;DR: This is the Ž rst book on generalized linear models written by authors not mostly associated with the biological sciences, and it is thoroughly enjoyable to read.
Book

Model assisted survey sampling

TL;DR: This book presents the principles of Estimation for Finite Populations and Important Sampling Designs and a Broader View of Errors in Surveys: Nonsampling Errors and Extensions of Probability Sampling Theory.
Book

Optimal Design of Experiments

TL;DR: Experimental designs in linear models Optimal designs for Scalar Parameter Systems Information Matrices Loewner Optimality Real Optimality Criteria Matrix Means The General Equivalence Theorem Optimal Moment Matrices and Optimal Designs D-, A-, E-, T-Optimality Admissibility of moment and information matrices Bayes Designs and Discrimination Designs Efficient Designs for Finite Sample Sizes Invariant Design Problems Kiefer Optimality Rotatability and Response Surface Designs Comments and References Biographies Bibliography Index as discussed by the authors
Related Papers (5)