Linear expectile regression under massive data

doi:10.1016/J.FMRE.2021.08.012

Home
/
Papers
/
Linear expectile regression under massive data

Journal Article•DOI•

Linear expectile regression under massive data

Shanshan Song¹, Yuanyuan Lin¹, Yong Zhou²•Institutions (2)

The Chinese University of Hong Kong¹, East China Normal University²

01 Sep 2021-Vol. 1, Iss: 5, pp 574-585

TL;DR: The Bahadur representation of the ALS estimator is derived, which serves as an important tool to study the relationship between the number of sub-machines K and the sample size and its consistency and asymptotic normality are established under mild conditions.

read less

Abstract: In this paper, we study the large-scale inference for a linear expectile regression model. To mitigate the computational challenges in the classical asymmetric least squares (ALS) estimation under massive data, we propose a communication-efficient divide and conquer algorithm to combine the information from sub-machines through confidence distributions. The resulting pooled estimator has a closed-form expression, and its consistency and asymptotic normality are established under mild conditions. Moreover, we derive the Bahadur representation of the ALS estimator, which serves as an important tool to study the relationship between the number of sub-machines K and the sample size. Numerical studies including both synthetic and real data examples are presented to illustrate the finite-sample performance of our method and support the theoretical results.

...read moreread less

Citations

PDF

Open Access

More filters

Posted Content•

Estimation of Tail Risk based on Extreme Expectiles

[...]

Abdelaati Daouia¹, Stéphane Girard², Gilles Stupfler³, Gilles Stupfler⁴•Institutions (4)

University of Toulouse¹, French Institute for Research in Computer Science and Automation², Aix-Marseille University³, University of Nottingham⁴

01 Jul 2017-Research Papers in Economics

TL;DR: In this article, the authors use tail expectiles to estimate alternative measures to the Value at Risk (VaR), Expected Shortfall (ES) and Marginal expected shortfall (MES), three instruments of risk protection of utmost importance in actuarial science and statistical finance.

...read moreread less

Abstract: We use tail expectiles to estimate alternative measures to the Value at Risk (VaR), Expected Shortfall (ES) and Marginal Expected Shortfall (MES), three instruments of risk protection of utmost importance in actuarial science and statistical finance. The concept of expectiles is a least squares analogue of quantiles. Both expectiles and quantiles were embedded in the more general class of M-quantiles as the minimizers of an asymmetric convex loss function. It has been proved very recently that the only M-quantiles that are coherent risk measures are the expectiles. Moreover, expectiles define the only coherent risk measure that is also elicit able. The elicit ability corresponds to the existence of a natural backtesting methodology. The estimation of expectiles did not, however, receive yet any attention from the perspective of extreme values. The first estimation method that we propose enables the usage of advanced high quantile and tail index estimators. The second method joins together the least asymmetrically weighted squares estimation with the tail restrictions of extreme-value theory. A main tool is to first estimate the large expectile-based VaR, ES and MES when they are covered by the range of the data, and then extrapolate these estimates to the very far tails. We establish the limit distributions of the proposed estimators when they are located in the range of the data or near and even beyond the maximum observed loss. We show through a detailed simulation study the good performance of the procedures, and also present concrete applications to medical insurance data and three large US investment banks.

...read moreread less

77 citations

Posted Content•

Embracing the Blessing of Dimensionality in Factor Models

[...]

Quefeng Li¹, Guang Cheng², Jianqing Fan³, Yuyan Wang³•Institutions (3)

University of North Carolina at Chapel Hill¹, Purdue University², Princeton University³

25 Oct 2016-arXiv: Statistics Theory

TL;DR: A divide-and-conquer algorithm is proposed to alleviate the computational burden, and shown not to sacrifice any statistical accuracy in comparison with a pooled analysis, and applied to a microarray data example that shows empirical benefits of using more data.

...read moreread less

Abstract: Factor modeling is an essential tool for exploring intrinsic dependence structures among high-dimensional random variables. Much progress has been made for estimating the covariance matrix from a high-dimensional factor model. However, the blessing of dimensionality has not yet been fully embraced in the literature: much of the available data is often ignored in constructing covariance matrix estimates. If our goal is to accurately estimate a covariance matrix of a set of targeted variables, shall we employ additional data, which are beyond the variables of interest, in the estimation? In this paper, we provide sufficient conditions for an affirmative answer, and further quantify its gain in terms of Fisher information and convergence rate. In fact, even an oracle-like result (as if all the factors were known) can be achieved when a sufficiently large number of variables is used. The idea of utilizing data as much as possible brings computational challenges. A divide-and-conquer algorithm is thus proposed to alleviate the computational burden, and also shown not to sacrifice any statistical accuracy in comparison with a pooled analysis. Simulation studies further confirm our advocacy for the use of full data, and demonstrate the effectiveness of the above algorithm. Our proposal is applied to a microarray data example that shows empirical benefits of using more data.

...read moreread less

13 citations

References

PDF

Open Access

More filters

Book•

Weak Convergence and Empirical Processes: With Applications to Statistics

[...]

Jon A. Wellner

14 Mar 1996

TL;DR: In this article, the authors define the Ball Sigma-Field and Measurability of Suprema and show that it is possible to achieve convergence almost surely and in probability.

...read moreread less

Abstract: 1.1. Introduction.- 1.2. Outer Integrals and Measurable Majorants.- 1.3. Weak Convergence.- 1.4. Product Spaces.- 1.5. Spaces of Bounded Functions.- 1.6. Spaces of Locally Bounded Functions.- 1.7. The Ball Sigma-Field and Measurability of Suprema.- 1.8. Hilbert Spaces.- 1.9. Convergence: Almost Surely and in Probability.- 1.10. Convergence: Weak, Almost Uniform, and in Probability.- 1.11. Refinements.- 1.12. Uniformity and Metrization.- 2.1. Introduction.- 2.2. Maximal Inequalities and Covering Numbers.- 2.3. Symmetrization and Measurability.- 2.4. Glivenko-Cantelli Theorems.- 2.5. Donsker Theorems.- 2.6. Uniform Entropy Numbers.- 2.7. Bracketing Numbers.- 2.8. Uniformity in the Underlying Distribution.- 2.9. Multiplier Central Limit Theorems.- 2.10. Permanence of the Donsker Property.- 2.11. The Central Limit Theorem for Processes.- 2.12. Partial-Sum Processes.- 2.13. Other Donsker Classes.- 2.14. Tail Bounds.- 3.1. Introduction.- 3.2. M-Estimators.- 3.3. Z-Estimators.- 3.4. Rates of Convergence.- 3.5. Random Sample Size, Poissonization and Kac Processes.- 3.6. The Bootstrap.- 3.7. The Two-Sample Problem.- 3.8. Independence Empirical Processes.- 3.9. The Delta-Method.- 3.10. Contiguity.- 3.11. Convolution and Minimax Theorems.- A. Appendix.- A.1. Inequalities.- A.2. Gaussian Processes.- A.2.1. Inequalities and Gaussian Comparison.- A.2.2. Exponential Bounds.- A.2.3. Majorizing Measures.- A.2.4. Further Results.- A.3. Rademacher Processes.- A.4. Isoperimetric Inequalities for Product Measures.- A.5. Some Limit Theorems.- A.6. More Inequalities.- A.6.1. Binomial Random Variables.- A.6.2. Multinomial Random Vectors.- A.6.3. Rademacher Sums.- Notes.- References.- Author Index.- List of Symbols.

...read moreread less

5,231 citations

Book•

Quantile Regression: Estimation and Simulation

[...]

Furno Marilena¹, Vistocco Domenico²•Institutions (2)

University of Naples Federico II¹, University of Cassino²

24 Sep 2018

1,282 citations

Journal Article•DOI•

Challenges of Big Data analysis

[...]

Jianqing Fan¹, Fang Han², Han Liu¹•Institutions (2)

Princeton University¹, Johns Hopkins University²

01 Jun 2014-National Science Review

TL;DR: In this paper, the authors provide an overview of the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures, and provide various new perspectives on the Big Data analysis and computation.

...read moreread less

Abstract: Big Data bring new opportunities to modern society and challenges to data scientists. On the one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This paper gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogenous assumptions in most statistical methods for Big Data cannot be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.

...read moreread less

897 citations

Journal Article•DOI•

Asymmetric least squares estimation and testing

[...]

Whitney K. Newey, James L. Powell

01 Jul 1987-Econometrica

TL;DR: In this article, the authors consider estimation and hypothesis tests for coefficients of linear regression models, where the coefficient estimates are based on location measures defined by an asymmetric least squares criterion function.

...read moreread less

Abstract: This paper considers estimation and hypothesis tests for coefficients of linear regression models, where the coefficient estimates are based on location measures defined by an asymmetric least squares criterion function. These asymmetric least squares estimators have properties which are analogous to regression quantile estimators, but are much simpler to calculate, as are the corresponding test statistics. The coefficient estimators can be used to construct test statistics for homoskedasticity and conditional symmetry of the error distribution, and we find these tests compare quite favorably with other commonly-used tests of these null hypotheses in terms of local relative efficiency. Consequently, asymmetric least squares estimation provides a convenient and relatively efficient method of summarizing the conditional distribution of a dependent variable given the regressors, and a means of testing whether a linear model is an adequate characterization of the "typical value" for this conditional distribution.

...read moreread less

888 citations

Journal Article•DOI•

Challenges of Big Data Analysis

[...]

Jianqing Fan¹, Fang Han², Han Liu¹•Institutions (2)

Princeton University¹, Johns Hopkins University²

07 Aug 2013-arXiv: Machine Learning

TL;DR: In this article, the authors provide an overview of the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures, and provide various new perspectives on the Big Data analysis and computation.

...read moreread less

Abstract: Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article give overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasis on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.

...read moreread less

733 citations