scispace - formally typeset
Search or ask a question

Showing papers on "Heteroscedasticity published in 2017"


Journal ArticleDOI
TL;DR: In this article, the authors explore the optimal conditional heteroskedasticity model with regards to goodness-of-fit to Bitcoin price data and find that the best model is the AR-CGARCH model, highlighting the significance of including both a short run and a long run component of the conditional variance.

730 citations


Journal ArticleDOI
TL;DR: A penalized Huber loss with diverging parameter to reduce biases created by the traditional Huer loss is proposed and a penalized robust approximate (RA) quadratic loss is called the RA lasso, which is compared with other regularized robust estimators based on quantile regression and least absolute deviation regression.
Abstract: Data subject to heavy-tailed errors are commonly encountered in various scientific fields. To address this problem, procedures based on quantile regression and Least Absolute Deviation (LAD) regression have been developed in recent years. These methods essentially estimate the conditional median (or quantile) function. They can be very different from the conditional mean functions, especially when distributions are asymmetric and heteroscedastic. How can we efficiently estimate the mean regression functions in ultra-high dimensional setting with existence of only the second moment? To solve this problem, we propose a penalized Huber loss with diverging parameter to reduce biases created by the traditional Huber loss. Such a penalized robust approximate quadratic (RA-quadratic) loss will be called RA-Lasso. In the ultra-high dimensional setting, where the dimensionality can grow exponentially with the sample size, our results reveal that the RA-lasso estimator produces a consistent estimator at the same rate as the optimal rate under the light-tail situation. We further study the computational convergence of RA-Lasso and show that the composite gradient descent algorithm indeed produces a solution that admits the same optimal rate after sufficient iterations. As a byproduct, we also establish the concentration inequality for estimating population mean when there exists only the second moment. We compare RA-Lasso with other regularized robust estimators based on quantile regression and LAD regression. Extensive simulation studies demonstrate the satisfactory finite-sample performance of RA-Lasso.

188 citations


Journal ArticleDOI
TL;DR: In this paper, the first-order asymptotic theory of the least squares (LS) estimator of the regression coefficients is worked out in the limit where both the cross-sectional dimension and the number of time periods become large.
Abstract: We analyze linear panel regression models with interactive fixed effects and predetermined regressors, for example lagged-dependent variables. The first-order asymptotic theory of the least squares (LS) estimator of the regression coefficients is worked out in the limit where both the cross-sectional dimension and the number of time periods become large. We find two sources of asymptotic bias of the LS estimator: bias due to correlation or heteroscedasticity of the idiosyncratic error term, and bias due to predetermined (as opposed to strictly exogenous) regressors. We provide a bias-corrected LS estimator. We also present bias-corrected versions of the three classical test statistics (Wald, LR, and LM test) and show their asymptotic distribution is a χ2-distribution. Monte Carlo simulations show the bias correction of the LS estimator and of the test statistics also work well for finite sample sizes.

165 citations


Journal ArticleDOI
TL;DR: In this paper, Chen et al. proposed two alternative estimators that achieve consistency for n → ∞ with fixed T and extended Chen et.al. (2014) results providing a feasible estimator when the inefficiency is heteroskedastic and follows a firstorder autoregressive process.

142 citations


Journal ArticleDOI
TL;DR: In this paper, the authors focus on approaches for representing error heteroscedasticity with respect to simulated streamflow, i.e., the pattern of larger errors in higher streamflow predictions.
Abstract: Reliable and precise probabilistic prediction of daily catchment-scale streamflow requires statistical characterization of residual errors of hydrological models. This study focuses on approaches for representing error heteroscedasticity with respect to simulated streamflow, i.e., the pattern of larger errors in higher streamflow predictions. We evaluate 8 common residual error schemes, including standard and weighted least squares, the Box-Cox transformation (with fixed and calibrated power parameter λ) and the log-sinh transformation. Case studies include 17 perennial and 6 ephemeral catchments in Australia and USA, and two lumped hydrological models. Performance is quantified using predictive reliability, precision and volumetric bias metrics. We find the choice of heteroscedastic error modelling approach significantly impacts on predictive performance, though no single scheme simultaneously optimizes all performance metrics. The set of Pareto optimal schemes, reflecting performance trade-offs, comprises Box-Cox schemes with λ of 0.2 and 0.5, and the log scheme (λ=0, perennial catchments only). These schemes significantly outperform even the average-performing remaining schemes (e.g., across ephemeral catchments, median precision tightens from 105% to 40% of observed streamflow, and median biases decrease from 25% to 4%). Theoretical interpretations of empirical results highlight the importance of capturing the skew/kurtosis of raw residuals and reproducing zero flows. Paradoxically, calibration of λ is often counterproductive: in perennial catchments, it tends to overfit low flows at the expense of abysmal precision in high flows. The log-sinh transformation is dominated by the simpler Pareto optimal schemes listed above. Recommendations for researchers and practitioners seeking robust residual error schemes for practical work are provided. This article is protected by copyright. All rights reserved.

113 citations


Journal ArticleDOI
TL;DR: In this article, a hybrid-Garch (generalized autoregressive conditional heteroskedasticity) methodology is proposed in order to integrate the individual forecasting models of the ARIMA (Autoregressive integrated moving average) and SVM(Support Vector Machine).

102 citations


Journal ArticleDOI
TL;DR: This paper summarized the advantages and limitations of using generalized linear models with continuous outcomes and provided two simplified examples that highlight the methodology involved in selecting, comparing, and interpreting models for positively skewed outcomes and certain heteroscedastic relationships.
Abstract: Some researchers in psychology have ordinarily relied on traditional linear models when assessing the relationship between predictor(s) and a continuous outcome, even when the assumptions of the traditional model (e.g., normality, homoscedasticity) are not satisfied. Of those who abandon the traditional linear model, some opt for robust versions of the ANOVA and regression statistics that usually focus on relationships for the typical or average case instead of trying to model relationships for the full range of relevant cases. Generalized linear models, on the other hand, model the relationships among variables using all available and relevant data and can be appropriate under certain conditions of non-normality and heteroscedasticity. In this paper, we summarize the advantages and limitations of using generalized linear models with continuous outcomes and provide two simplified examples that highlight the methodology involved in selecting, comparing, and interpreting models for positively skewed outcomes and certain heteroscedastic relationships.

78 citations


Journal Article
TL;DR: In this article, rank-based estimators for nonparametric treatment effects in general factorial designs have been proposed and compared in extensive simulations with two additional Wald-type tests.
Abstract: Existing tests for factorial designs in the non‐parametric case are based on hypotheses formulated in terms of distribution functions. Typical null hypotheses, however, are formulated in terms of some parameters or effect measures, particularly in heteroscedastic settings. Here this idea is extended to non‐parametric models by introducing a novel non‐parametric analysis‐of‐variance type of statistic based on ranks or pseudoranks which is suitable for testing hypotheses formulated in meaningful non‐parametric treatment effects in general factorial designs. This is achieved by a careful detailed study of the common distribution of rank‐based estimators for the treatment effects. Since the statistic is asymptotically not a pivotal quantity we propose three different approximation techniques, discuss their theoretic properties and compare them in extensive simulations together with two additional Wald‐type tests. An extension of the presented idea to general repeated measures designs is briefly outlined. The rank‐ and pseudorank‐based procedures proposed maintain the preassigned type I error rate quite accurately, also in unbalanced and heteroscedastic models.

63 citations


Journal ArticleDOI
TL;DR: Volatility forecasts associated with the price of gold, silver, and copper are analyzed, finding that the best models to forecast the price return volatility of these main metals are the ANN-GARCH model with regressors.
Abstract: A hybrid model is analyzed to predict the price volatility of gold, silver and copperThe hybrid model used is a ANN-GARCH model with regressors.APGARCH with exogenous variables is used as benchmark.The benchmark is better than the classical GARCH used in previous studies.The incorporation of ANN into the best Garch with regressors increases the accuracy. In this article, we analyze volatility forecasts associated with the price of gold, silver, and copper, three of the most important metals in the world market. First, a group of GARCH models are used to forecast volatility, including explanatory variables like the US Dollar-Euro and US Dollar-Yen exchange rates, the oil price, and the Chinese, Indian, British, and American stock market indexes. Subsequently, these model predictions are used as inputs for a neural network in order to analyze the increase in hybrid predictive power. The results obtained show that for these three metals, using the hybrid neural network model increases the forecasting power of out-of-sample volatility. In order to optimize the results, we conducted a series of sensitizations of the artificial neural network architecture and analyses for different cases, finding that the best models to forecast the price return volatility of these main metals are the ANN-GARCH model with regressors. Due to the heteroscedasticity in the financial series, the loss function used is Heteroskedasticity-adjusted Mean Squared Error (HMSE), and to test the superiority of the models, the Model Confidence Set is used.

60 citations


ReportDOI
TL;DR: The linear regression model is widely used in empirical work in economics, statistics, and many other disciplines as discussed by the authors, and researchers often include many covariates in their linear model specification in a...
Abstract: The linear regression model is widely used in empirical work in economics, statistics, and many other disciplines. Researchers often include many covariates in their linear model specification in a...

57 citations


Journal ArticleDOI
TL;DR: In this paper, the authors show how asymptotically valid inference in regression models based on the weighted least squares estimator can be obtained even when the model for reweighting the data is misspecified.

Journal ArticleDOI
TL;DR: In this paper, the generalized generalized autoregressive conditional heteroscedasticity model is extended to allow the distribution of the observations to be skewed and asymmetric, allowing the information matrix to return to the usual case under symmetry.
Abstract: Exponential generalized autoregressive conditional heteroscedasticity models in which the dynamics of the logarithm of scale are driven by the conditional score are known to exhibit attractive theoretical properties for the t distribution and general error distribution. A model based on the generalized t includes both as special cases. We derive the information matrix for the generalized t and show that, when parameterized with the inverse of the tail index, it remains positive definite in the limit as the distribution goes to a general error distribution. We generalize further by allowing the distribution of the observations to be skewed and asymmetric. Our method for introducing asymmetry ensures that the information matrix reverts to the usual case under symmetry. We are able to derive analytic expressions for the conditional moments of our exponential generalized autoregressive conditional heteroscedasticity model as well as the information matrix of the dynamic parameters. The practical value of the model is illustrated with commodity and stock return data. Overall, the approach offers a unified, flexible, robust, and effective treatment of volatility.

Journal ArticleDOI
TL;DR: In this paper, weather-based methods to estimate probabilistic RTTR forecasts for overhead lines are described. But they provide only single-point estimates with no indication of the size or distribution of possible errors.
Abstract: Conventional approaches to forecasting of real-time thermal ratings (RTTRs) provide only single-point estimates with no indication of the size or distribution of possible errors. This paper describes weather-based methods to estimate probabilistic RTTR forecasts for overhead lines which can be used by a system operator within a chosen risk policy with respect to the probability of a rating being exceeded. Predictive centers of weather conditions are estimated as a sum of residuals predicted by a suitable auto-regressive model and temporal trends fitted by Fourier series. Conditional heteroscedasticity of the predictive distribution is modelled as a linear function of recent changes in residuals within one hour for air temperature and wind speed or concentration of recent wind direction observations within two hours. A technique of minimum continuous ranked probability score estimation is used to estimate predictive distributions. Numerous RTTRs for a particular span are generated by a combination of the Monte Carlo method where weather inputs are randomly sampled from the modelled predictive distributions at a particular future moment and a thermal model of overhead conductors. Kernel density estimation is then used to smooth and estimate the percentiles of RTTR forecasts which are then compared with actual ratings and discussed alongside practical issues around the use of RTTR forecasts.

Journal ArticleDOI
TL;DR: It is found that strong white noise tests often suggest that such series exhibit significant autocorrelation, whereas tests, which account for functional conditional heteroscedasticity, show that these data are in fact uncorrelated in a function space.

Journal ArticleDOI
15 Sep 2017-Energy
TL;DR: In this paper, a new statistical approach for jointly predicting wind speed, wind direction and air pressure is introduced, which combines a multivariate seasonal time varying threshold autoregressive model with interactions (TVARX) with a threshold seasonal auto-regressive conditional heteroscedastic (TARCHX) model.

Journal ArticleDOI
TL;DR: In this article, a modified Granger causality test based on the generalized auto-regressive conditional heteroscedasticity type of integer-valued time series models was proposed to analyse the relationship between the number of crimes and the temperature as an environmental factor.
Abstract: Summary We investigate the causal relationship between climate and criminal behaviour. Considering the characteristics of integer-valued time series of criminal incidents, we propose a modified Granger causality test based on the generalized auto-regressive conditional heteroscedasticity type of integer-valued time series models to analyse the relationship between the number of crimes and the temperature as an environmental factor. More precisely, we employ the Poisson, negative binomial and log-linear Poisson integer-valued generalized auto-regressive conditional heteroscedasticity models and particularly adopt a Bayesian method for our analysis. The Bayes factors and posterior probability of the null hypothesis help to determine the causality between the variables considered. Moreover, employing an adaptive Markov chain Monte Carlo sampling scheme, we estimate model parameters and initial values. As an illustration, we evaluate our test through a simulation study and, to examine whether or not temperature affects crime activities, we apply our method to data sets categorized as sexual offences, drug offences, theft of motor vehicles, and domestic-violence-related assault in Ballina, New South Wales, Australia. The result reveals that more sexual offences, drug offences and domestic-violence-related assaults occur during the summer than in other seasons of the year. This evidence strongly advocates a causal relationship between crime and temperature.

Journal ArticleDOI
TL;DR: Godfrey and Yamagata as mentioned in this paper proposed a heteroskedasticity-robust Breusch-Pagan test for linear panel data models, without necessarily assuming independence of the cross-sections.

Journal ArticleDOI
TL;DR: In this paper, the stochastic frontier and conditional mean of inefficiency are estimated in the Partially Linear Regression (PLR) framework for the conditional mean, and a test of correct parametric specification of the scaling function is provided.
Abstract: We consider the benchmark stochastic frontier model where inefficiency is directly influenced by observable determinants. In this setting, we estimate the stochastic frontier and the conditional mean of inefficiency without imposing any distributional assumptions. To do so we cast this model in the partly linear regression framework for the conditional mean. We provide a test of correct parametric specification of the scaling function. An empirical example is also provided to illustrate the practical value of the methods described here.

Journal ArticleDOI
TL;DR: FlexCode as discussed by the authors reformulates conditional density estimation as a non-parametric orthogonal series problem where the expansion coefficients are estimated by regression, which can adapt to a variety of challenging high-dimensional settings with different structures in the data (e.g., a large number of irrelevant components and nonlinear manifold structure).
Abstract: There is a growing demand for nonparametric conditional density estimators (CDEs) in fields such as astronomy and economics. In astronomy, for example, one can dramatically improve estimates of the parameters that dictate the evolution of the Universe by working with full conditional densities instead of regression (i.e., conditional mean) estimates. More generally, standard regression falls short in any prediction problem where the distribution of the response is more complex with multi-modality, asymmetry or heteroscedastic noise. Nevertheless, much of the work on high-dimensional inference concerns regression and classification only, whereas research on density estimation has lagged behind. Here we propose FlexCode, a fully nonparametric approach to conditional density estimation that reformulates CDE as a non-parametric orthogonal series problem where the expansion coefficients are estimated by regression. By taking such an approach, one can efficiently estimate conditional densities and not just expectations in high dimensions by drawing upon the success in high-dimensional regression. Depending on the choice of regression procedure, our method can adapt to a variety of challenging high-dimensional settings with different structures in the data (e.g., a large number of irrelevant components and nonlinear manifold structure) as well as different data types (e.g., functional data, mixed data types and sample sets). We study the theoretical and empirical performance of our proposed method, and we compare our approach with traditional conditional density estimators on simulated as well as real-world data, such as photometric galaxy data, Twitter data, and line-of-sight velocities in a galaxy cluster.

Journal ArticleDOI
TL;DR: In this article, the authors show that heteroskedastic distributions of test score distributions of schools or demographic groups are often summarized by frequencies of students scoring in a small number of ordered proficiency categories.
Abstract: Test score distributions of schools or demographic groups are often summarized by frequencies of students scoring in a small number of ordered proficiency categories. We show that heteroskedastic o...

Journal ArticleDOI
TL;DR: In this article, the main objective is to provide analytical expressions for forecast variances that can be used in prediction intervals for the exponential smoothing methods, based on state space models with a single source of error.
Abstract: The main objective of this paper is to provide analytical expressions for forecast variances that can be used in prediction intervals for the exponential smoothing methods. These expressions are based on state space models with a single source of error that underlie the exponential smoothing methods. Three general classes of the state space models are presented. The first class is the standard linear state space model with homoscedastic errors, the second retains the linear structure but incorporates a dynamic form of heteroscedasticity, and the third allows for non-linear structure in the observation equation as well as heteroscedasticity. Exact matrix formulas for the forecast variances are found for each of these three classes of models. These formulas are specialized to non-matrix formulas for fifteen state space models that underlie nine exponential smoothing methods, including all the widely used methods. In cases where an ARIMA model also underlies an exponential smoothing method, there is an equivalent state space model with the same variance expression. We also discuss relationships between these new ideas and previous suggestions for finding forecast variances and prediction intervals for the exponential smoothing methods.

Journal ArticleDOI
TL;DR: The G-squared measure as discussed by the authors is a measure to test whether two univariate random variables are independent and to measure the strength of their relationship, which is almost identical to the square of the Pearson correlation coefficient.
Abstract: Detecting dependence between two random variables is a fundamental problem. Although the Pearson correlation coefficient is effective for capturing linear dependence, it can be entirely powerless for detecting nonlinear and/or heteroscedastic patterns. We introduce a new measure, G-squared, to test whether two univariate random variables are independent and to measure the strength of their relationship. The G-squared statistic is almost identical to the square of the Pearson correlation coefficient, R-squared, for linear relationships with constant error variance, and has the intuitive meaning of the piecewise R-squared between the variables. It is particularly effective in handling nonlinearity and heteroscedastic errors. We propose two estimators of G-squared and show their consistency. Simulations demonstrate that G-squared estimators are among the most powerful test statistics compared with several state-of-the-art methods.

Journal ArticleDOI
TL;DR: In this paper, Chen et al. proposed two alternative estimation procedures that, by relying on a first-difference data transformation, achieve consistency when n goes to infinity with fixed T. They investigate the finite sample behavior of the proposed estimators through a set of Monte Carlo experiments.
Abstract: The classical stochastic frontier panel data models provide no mechanism for disentangling individual time-invariant unobserved heterogeneity from inefficiency. Greene (2005a, b) proposed the ‘true’ fixed-effects specification, which distinguishes these two latent components while allowing for time-variant inefficiency. However, due to the incidental parameters problem, the maximum likelihood estimator proposed by Greene may lead to biased variance estimates. We propose two alternative estimation procedures that, by relying on a first-difference data transformation, achieve consistency when n goes to infinity with fixed T. Furthermore, we extend the approach of Chen et al. (2014) by providing a computationally feasible solution for estimating models in which inefficiency can be heteroskedastic and may follow a first-order autoregressive process. We investigate the finite sample behavior of the proposed estimators through a set of Monte Carlo experiments. Our results show good finite sample properties, especially in small samples. We illustrate the usefulness of the new approach by applying it to the technical efficiency of hospitals.

Journal ArticleDOI
TL;DR: In this paper, an approach for cost estimation that combines a maximum likelihood estimator for data transformations with least angle regression for dimensionality reduction is presented. But, the results from the study demonstrate that the proposed approach frequently leads to consistent parametric estimates that address the structural bias and heteroscedasticity that plague the current cost-estimation procedures.
Abstract: As project planners continue to move towards frameworks such as probabilistic life-cycle cost analysis to evaluate competing transportation investments, there is a need to enhance the current cost-estimation approaches that underlie these models to enable improved project selection This paper presents an approach for cost estimation that combines a maximum likelihood estimator for data transformations with least angle regression for dimensionality reduction The authors apply the proposed method for 15 different pavement bid items across five states in the United States The results from the study demonstrate that the proposed approach frequently leads to consistent parametric estimates that address the structural bias and heteroscedasticity that plague the current cost-estimation procedures Both of these aspects are particularly important for large-scale construction projects, where traditional methods tend to systematically underestimate expected construction costs and overestimate the associated variance

Book ChapterDOI
01 Jan 2017
TL;DR: In this paper, a wide range of robust regression estimators are discussed, including logistic regression, multivariate regression, and robust multivariate linear regression, two of which take into account the association among the outcome variables, in contrast to most estimators.
Abstract: Chapter 10 summarizes a wide range of robust regression estimators. Their relative merits are discussed. Generally, these estimators deal effectively with regression outliers and leverage points. Some can offer a substantial advantage, in terms of efficiency, when there is heteroscedasticity. Included are robust versions of logistic regression and recently derived methods for dealing with multivariate regression, two of which take into account the association among the outcome variables, in contrast to most estimators that have been proposed. R functions for applying these estimators are described.

Posted Content
TL;DR: In this paper, a nonparametric heteroscedastic elaboration of BART is proposed, where the mean function is modeled with a sum of trees, each of which determines an additive contribution to the mean.
Abstract: BART (Bayesian Additive Regression Trees) has become increasingly popular as a flexible and scalable nonparametric regression approach for modern applied statistics problems. For the practitioner dealing with large and complex nonlinear response surfaces, its advantages include a matrix-free formulation and the lack of a requirement to prespecify a confining regression basis. Although flexible in fitting the mean, BART has been limited by its reliance on a constant variance error model. This homoscedastic assumption is unrealistic in many applications. Alleviating this limitation, we propose HBART, a nonparametric heteroscedastic elaboration of BART. In BART, the mean function is modeled with a sum of trees, each of which determines an additive contribution to the mean. In HBART, the variance function is further modeled with a product of trees, each of which determines a multiplicative contribution to the variance. Like the mean model, this flexible, multidimensional variance model is entirely nonparametric with no need for the prespecification of a confining basis. Moreover, with this enhancement, HBART can provide insights into the potential relationships of the predictors with both the mean and the variance. Practical implementations of HBART with revealing new diagnostic plots are demonstrated with simulated and real data on used car prices, fishing catch production and alcohol consumption.

Journal ArticleDOI
TL;DR: This work proposes a test statistic based on a linear combination of U-Statistics which can be quickly calculated without using computationally intensive U-statistics and is applicable to non-normal multi-sample high-dimensional data without assuming a common covariance matrix among different samples.

Journal ArticleDOI
TL;DR: A new R package is presented for dealing with non-normality and variance heterogeneity of sample data when conducting hypothesis tests of main effects and interactions in mixed models, which departs from an existing SAS program which implements Johansen's general formulation of Welch-James’s statistic with approximate degrees of freedom.
Abstract: A new R package is presented for dealing with non-normality and variance heterogeneity of sample data when conducting hypothesis tests of main effects and interactions in mixed models. The proposal departs from an existing SAS program which implements Johansen’s general formulation of Welch-James’s statistic with approximate degrees of freedom, which makes it suitable for testing any linear hypothesis concerning cell means in univariate and multivariate mixed model designs when the data pose non-normality and non-homogeneous variance. Improved type I error rate control is obtained using bootstrapping for calculating an empirical critical value, whereas robustness against non-normality is achieved through trimmed means and Winsorized variances. A wrapper function eases the application of the test in common situations, such as performing omnibus tests on all effects and interactions, pairwise contrasts, and tetrad contrasts of two-way interactions. The package is demonstrated in several problems including unbalanced univariate and multivariate designs.

Journal ArticleDOI
TL;DR: In this article, a framework for testing for shifts in the level of a series which accommodates the possibility of changing variability is developed, based on a new functional central limit theorem for dependent random variables whose variance can change or trend in a substantial way.

Posted Content
TL;DR: In this article, the authors propose a nonparametric way to test the hypothesis that time-variation in intraday volatility is caused solely by a deterministic and recurrent diurnal pattern.
Abstract: In this paper, we propose a nonparametric way to test the hypothesis that time-variation in intraday volatility is caused solely by a deterministic and recurrent diurnal pattern. We assume that noisy high-frequency data from a discretely sampled jump-diffusion process are available. The test is then based on asset returns, which are deflated by a model-free jump- and noise-robust estimate of the seasonal component and therefore homoscedastic under the null. The t-statistic (after pre-averaging and jump-truncation) diverges in the presence of stochastic volatility and has a standard normal distribution otherwise. We prove that replacing the true diurnal factor with our estimator does not affect the asymptotic theory. A Monte Carlo simulation also shows this substitution has no discernable impact in finite samples. The test is, however, distorted by small infinite-activity price jumps. To improve inference, we propose a new bootstrap approach, which leads to almost correctly sized tests of the null hypothesis. We apply the developed framework to a large cross-section of equity high-frequency data and find that the diurnal pattern accounts for a rather significant fraction of intraday variation in volatility, but important sources of heteroscedasticity remain present in the data.