scispace - formally typeset
Search or ask a question

Showing papers on "Nonparametric statistics published in 2020"


Journal ArticleDOI
TL;DR: An intuitive and easy-to-implement nonparametric density estimator based on local polynomial techniques that is fully boundary adaptive and automatic, but does not require prebinning or any other transformation of the data is introduced.
Abstract: This article introduces an intuitive and easy-to-implement nonparametric density estimator based on local polynomial techniques. The estimator is fully boundary adaptive and automatic, but does not...

235 citations


Book ChapterDOI
01 Jan 2020
TL;DR: This short chapter introduces the SPSS software, including an overview of its capabilities, and Topics such as data preparation, data import, options of parametric and nonparametric statistical tests, export and editing of statistical results, and creation of charts and tables are covered.
Abstract: IBM SPSS Statistics (“Statistical Package for the Social Sciences”) is a software used for the statistical analysis, data management, and data documentation. This short chapter introduces the SPSS software, including an overview of its capabilities. Topics such as data preparation, data import, options of parametric and nonparametric statistical tests, export and editing of statistical results, and creation of charts and tables are covered.

133 citations


ReportDOI
TL;DR: This paper proposed a nonparametric method to test which characteristics provide independent information for the cross-section of expected returns, and used the adaptive group LASSO to select characteristics and to estimate how they affect expected returns nonparametrically.
Abstract: We propose a nonparametric method to test which characteristics provide independent information for the cross section of expected returns. We use the adaptive group LASSO to select characteristics and to estimate how they affect expected returns nonparametrically. Our method can handle a large number of characteristics, allows for a exible functional form, and is insensitive to outliers. Many of the previously identified return predictors do not provide incremental information for expected returns, and nonlinearities are important. Our proposed method has higher out-of-sample explanatory power compared to linear panel regressions, and increases Sharpe ratios by 50%.

100 citations


Journal ArticleDOI
TL;DR: In this paper, the sum of ranking difference (SRD) algorithm was applied to create a nonparametric Partial least squares-discriminant analysis (PLS-DA) model.
Abstract: Identifying tea grades is crucial to providing consumers with tea and ensuring consumer rights. Partial least squares–discriminant analysis (PLS-DA) is a simple and traditional classification algorithm in analyzing e-tongue data. However, the number of latent variables (LVs) in a PLS-DA model needs to be determined, and cross-validation is the most common way to identify the optimal latent variables. To overcome this obstacle, sum of ranking difference (SRD) algorithm was applied to create a non-parametric PLS-DA-SRD model. The performance of PLS-DA and PLS-DA-SRD models were then compared, and significant improvement in term of accuracy, sensitivity, and specificity was obtained when SRD was combined with PLS-DA algorithm. Moreover, no training phase was needed to identify the optimal LVs for PLS-DA, making the calculation of classification rapid and concise. The PLS-DA-SRD method demonstrated its efficiency and capability by successfully identifying the tea sample grade.

84 citations


Journal ArticleDOI
01 Jan 2020
TL;DR: The basic aim of the present paper is to explore order statistics based nonparametric method to estimate the appropriate number of samples required to generate the realizations of the uncertain random parameters which further will facilitate user to establish the tolerance limits.
Abstract: Measurements always associate a certain degree of uncertainty. In order to achieve high precision measurement in presence of uncertainty an efficient computation is desired. Statistical definition of precision of any measurement is defined as one standard deviation divided by the square root of the sample size taken for measurements. Accordingly, tolerance limits are statistical in nature. Therefore, measurements are required to repeat large number of times to obtain better precision. Hence, the target is to establish the tolerance limits in presence of uncertainty in computer and communication systems. Nonparametric method is applied to establish the tolerance limits when uncertainty is present in measurements. The basic aim of the present paper is to explore order statistics based nonparametric method to estimate the appropriate number of samples required to generate the realizations of the uncertain random parameters which further will facilitate user to establish the tolerance limits. A case study of solute transport model is experimented where tolerance limits of solute concentration at any spatial location at any temporal moment is shown. Results obtained based on the nonparametric simulation are compared with the results obtained by executing traditional method of setting tolerance limits using Monte Carlo simulations using computer and communication systems.

63 citations


Journal Article
TL;DR: It turns out that under mild conditions, function estimation consistency and convergence may be pursued in modal regression as in vanilla regression protocols, however, it outperforms these regression models in terms of robustness as shown in the study from a re-descending M-estimation view.
Abstract: This paper studies the nonparametric modal regression problem systematically from a statistical learning view. Originally motivated by pursuing a theoretical understanding of the maximum correntropy criterion based regression (MCCR), our study reveals that MCCR with a tending-to-zero scale parameter is essentially modal regression. We show that nonparametric modal regression problem can be approached via the classical empirical risk minimization. Some efforts are then made to develop a framework for analyzing and implementing modal regression. For instance, the modal regression function is described, the modal regression risk is defined explicitly and its \textit{Bayes} rule is characterized; for the sake of computational tractability, the surrogate modal regression risk, which is termed as the generalization risk in our study, is introduced. On the theoretical side, the excess modal regression risk, the excess generalization risk, the function estimation error, and the relations among the above three quantities are studied rigorously. It turns out that under mild conditions, function estimation consistency and convergence may be pursued in modal regression as in vanilla regression protocols, such as mean regression, median regression, and quantile regression. However, it outperforms these regression models in terms of robustness as shown in our study from a re-descending M-estimation view. This coincides with and in return explains the merits of MCCR on robustness. On the practical side, the implementation issues of modal regression including the computational algorithm and the tuning parameters selection are discussed. Numerical assessments on modal regression are also conducted to verify our findings empirically.

59 citations


Posted Content
TL;DR: This work proposes a convex procedure that controls the worst-case performance over all subpopulations of a given size and comes with finite-sample (nonparametric) convergence guarantees on the best-off subpopulation.
Abstract: While modern large-scale datasets often consist of heterogeneous subpopulations---for example, multiple demographic groups or multiple text corpora---the standard practice of minimizing average loss fails to guarantee uniformly low losses across all subpopulations. We propose a convex procedure that controls the worst-case performance over all subpopulations of a given size. Our procedure comes with finite-sample (nonparametric) convergence guarantees on the worst-off subpopulation. Empirically, we observe on lexical similarity, wine quality, and recidivism prediction tasks that our worst-case procedure learns models that do well against unseen subpopulations.

54 citations


Posted Content
TL;DR: This paper proposed a nonparametric inference method for causal effects of continuous treatment variables, under unconfoundedness and in the presence of high-dimensional or non-parametric nuisance parameters.
Abstract: We propose a nonparametric inference method for causal effects of continuous treatment variables, under unconfoundedness and in the presence of high-dimensional or nonparametric nuisance parameters. Our double debiased machine learning (DML) estimators for the average dose-response function (or the average structural function) and the partial effects are asymptotically normal with nonparametric convergence rates. The nuisance estimators for the conditional expectation function and the conditional density can be nonparametric kernel or series estimators or ML methods. Using a kernel-based doubly robust influence function and cross-fitting, we give primitive conditions under which the nuisance estimators do not affect the first-order large sample distribution of the DML estimators. We further give low-level conditions for kernels and series estimators, as well as modern ML methods - the generalized random forest and deep neural networks. We justify the use of kernel to localize the continuous treatment at a given value by the Gateaux derivative. We implement various ML methods in Monte Carlo simulations and an empirical application on a job training program evaluation.

51 citations


Journal ArticleDOI
TL;DR: Spectral Angle Mapper gave the highest accuracy, while maximum likelihood classification gave the least for allocation and spatial disagreement indices, whereas artificial neural network performed better in land-use–land-cover classification studies.
Abstract: In the face of rapid urbanization, monitoring urban expansion has gained importance to sustainably manage the land resources and minimize the impact on the environment. Monitoring urban growth using satellite data involves computing the state of land use–land cover and their change over time. A number of computing methods have been developed to process and interpret the satellite results for an urban environment. However, due to a large number of parametric and nonparametric algorithms used for land-use–land-cover classification, there is uncertainty regarding choosing the best algorithm to measure the urban processes. In this study, several parametric (maximum likelihood) and nonparametric (support vector machine, spectral angle mapper, artificial neural network and decision tree) algorithms were used. The study was aimed at finding out the best available classification technique for land-use–land-cover classification and change detection. Landsat 8, the latest in Landsat series, and Landsat 7 and 5 freely available satellite data were used. Due to the redundancy reported for the traditional kappa-based indices, we applied modern disagreement indices to assess the accuracy of the classification process. Artificial neural network for Landsat 8 image had the highest kappa coefficient, while spectral angle mapper had the highest overall agreement (97%) and least quantity allocation error (1%). Spectral Angle Mapper gave the highest accuracy, while maximum likelihood classification gave the least for allocation and spatial disagreement indices. We found that spectral angle mapper gave the best results for land-use–land-cover change analysis in terms of least omission and commission errors (2.5% each) and highest overall agreement, whereas artificial neural network performed better in land-use–land-cover classification studies.

49 citations


Journal ArticleDOI
TL;DR: A nonparametric accelerated failure time model that can be used to analyze heterogeneous treatment effects (HTE) when patient outcomes are time-to-event and requires little user input in terms of model specification for treatment covariate interactions or for tuning parameter selection.
Abstract: Individuals often respond differently to identical treatments, and characterizing such variability in treatment response is an important aim in the practice of personalized medicine. In this article, we describe a nonparametric accelerated failure time model that can be used to analyze heterogeneous treatment effects (HTE) when patient outcomes are time-to-event. By utilizing Bayesian additive regression trees and a mean-constrained Dirichlet process mixture model, our approach offers a flexible model for the regression function while placing few restrictions on the baseline hazard. Our nonparametric method leads to natural estimates of individual treatment effect and has the flexibility to address many major goals of HTE assessment. Moreover, our method requires little user input in terms of model specification for treatment covariate interactions or for tuning parameter selection. Our procedure shows strong predictive performance while also exhibiting good frequentist properties in terms of parameter coverage and mitigation of spurious findings of HTE. We illustrate the merits of our proposed approach with a detailed analysis of two large clinical trials (N = 6769) for the prevention and treatment of congestive heart failure using an angiotensin-converting enzyme inhibitor. The analysis revealed considerable evidence for the presence of HTE in both trials as demonstrated by substantial estimated variation in treatment effect and by high proportions of patients exhibiting strong evidence of having treatment effects which differ from the overall treatment effect.

48 citations


Posted Content
TL;DR: In this article, the authors review some of the main Bayesian approaches that have been employed to define probability models where the complete response distribution may vary flexibly with predictors, and some extensions have been proposed to tackle this general problem using nonparametric approaches.
Abstract: Standard regression approaches assume that some finite number of the response distribution characteristics, such as location and scale, change as a (parametric or nonparametric) function of predictors. However, it is not always appropriate to assume a location/scale representation, where the error distribution has unchanging shape over the predictor space. In fact, it often happens in applied research that the distribution of responses under study changes with predictors in ways that cannot be reasonably represented by a finite dimensional functional form. This can seriously affect the answers to the scientific questions of interest, and therefore more general approaches are indeed needed. This gives rise to the study of fully nonparametric regression models. We review some of the main Bayesian approaches that have been employed to define probability models where the complete response distribution may vary flexibly with predictors. We focus on developments based on modifications of the Dirichlet process, historically termed dependent Dirichlet processes, and some of the extensions that have been proposed to tackle this general problem using nonparametric approaches.

Journal ArticleDOI
TL;DR: It is explained that non-parametric tests have clear drawbacks in medical research, and, that's the good news, they are often not necessary.
Abstract: When statistically comparing outcomes between two groups, researchers have to decide whether to use parametric methods, such as the t-test, or non-parametric methods, like the Mann-Whitney test. In endocrinology, for example, many studies compare hormone levels between groups, or at different points in time. Many papers apply non-parametric tests to compare groups. We will explain that non-parametric tests have clear drawbacks in medical research, and, that's the good news, they are often not necessary.

Proceedings Article
01 Jun 2020
TL;DR: A 'tripod' of theorems is established that connects three notions of uncertainty quantification---calibration, confidence intervals and prediction sets---for binary classification in the distribution-free setting, that is without making any distributional assumptions on the data.
Abstract: We study three notions of uncertainty quantification -- calibration, confidence intervals and prediction sets -- for binary classification in the distribution-free setting, that is without making any distributional assumptions on the data. With a focus towards calibration, we establish a 'tripod' of theorems that connect these three notions for score-based classifiers. A direct implication is that distribution-free calibration is only possible, even asymptotically, using a scoring function whose level sets partition the feature space into at most countably many sets. Parametric calibration schemes such as variants of Platt scaling do not satisfy this requirement, while nonparametric schemes based on binning do. To close the loop, we derive distribution-free confidence intervals for binned probabilities for both fixed-width and uniform-mass binning. As a consequence of our 'tripod' theorems, these confidence intervals for binned probabilities lead to distribution-free calibration. We also derive extensions to settings with streaming data and covariate shift.

Journal ArticleDOI
TL;DR: A semi-parametric approach is developed to relax the parametric assumption implicit in BSL to an extent and maintain the computational advantages of BSL without any additional tuning and can be significantly more robust than BSL and another approach in the literature.
Abstract: Bayesian synthetic likelihood (BSL) is now a well-established method for performing approximate Bayesian parameter estimation for simulation-based models that do not possess a tractable likelihood function. BSL approximates an intractable likelihood function of a carefully chosen summary statistic at a parameter value with a multivariate normal distribution. The mean and covariance matrix of this normal distribution are estimated from independent simulations of the model. Due to the parametric assumption implicit in BSL, it can be preferred to its nonparametric competitor, approximate Bayesian computation, in certain applications where a high-dimensional summary statistic is of interest. However, despite several successful applications of BSL, its widespread use in scientific fields may be hindered by the strong normality assumption. In this paper, we develop a semi-parametric approach to relax this assumption to an extent and maintain the computational advantages of BSL without any additional tuning. We test our new method, semiBSL, on several challenging examples involving simulated and real data and demonstrate that semiBSL can be significantly more robust than BSL and another approach in the literature.

Journal ArticleDOI
TL;DR: A nonparametric identification method based on ν (‘nu’)-support vector regression ( ν -SVR) is proposed to establish robust models of ship maneuvering motion in an easy-to-operate way, verifying the effectiveness of the method.

Journal ArticleDOI
TL;DR: A chance constrained extreme learning machine (CCELM) model is developed to generate quality nonparametric proportion-free PIs of wind power generation, which minimizes the expected interval width subject to the PI coverage probability constraint.
Abstract: Confronted with considerable intermittence and variability of wind power, prediction intervals (PIs) serve as a crucial tool to assist power system decision-making under uncertainties. Conventional PIs rely on predetermining the lower and upper quantile proportions and therefore suffer from conservative interval width. This paper innovatively develops a chance constrained extreme learning machine (CCELM) model to generate quality nonparametric proportion-free PIs of wind power generation, which minimizes the expected interval width subject to the PI coverage probability constraint. Due to the independency on the preset PI bounds proportions, the proposed CCELM model merits high adaptivity and taps the latent potentialities for PI shortening. The convexity of extreme learning machine renders the sample average approximation counterpart of stochastic CCELM model equivalent to a parameter searching task in parametric optimization problem with polyhedral feasible region. A novel difference of convex functions optimization based bisection search (DCBS) algorithm is proposed to efficiently construct the CCELM model, which successfully realizes machine learning by means of solving linear programming problems sequentially. Comprehensive numerical experiments based on actual wind farm data demonstrate the significant effectiveness and efficiency of the developed CCELM model and DCBS algorithm.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed new nonparametric estimators for the reduced dimensional conditional average treatment effect (CATE) function, given the unconfoundedness assumption, and showed that these estimators are robust to nuisance fun.
Abstract: Given the unconfoundedness assumption, we propose new nonparametric estimators for the reduced dimensional conditional average treatment effect (CATE) function. In the first stage, the nuisance fun...

Journal ArticleDOI
TL;DR: In this article, the results of t-test and U-test are compared under different skewness values and the results showed that using skewnness values alone to decide about normality of a dataset may not be enough.
Abstract: Checking the normality assumption is necessary to decide whether a parametric or non-parametric test needs to be used. Different ways are suggested in literature to use for checking normality. Skewness and kurtosis values are one of them. However, there is no consensus which values indicated a normal distribution. Therefore, the effects of different criteria in terms of skewness values were simulated in this study. Specifically, the results of t-test and U-test are compared under different skewness values. The results showed that t-test and U-test give different results when the data showed skewness. Based on the results, using skewness values alone to decide about normality of a dataset may not be enough. Therefore, the use of non-parametric tests might be inevitable.

Journal ArticleDOI
TL;DR: In this article, a semiparametric Extended Generalized Pareto Distribution (EGPD) model is proposed to model the transition function in a nonparametric fashion, based on Bernstein polynomial approximations.
Abstract: Precipitation amounts at daily or hourly scales are skewed to the right and heavy rainfall is poorly modeled by a simple gamma distribution An important, yet challenging topic in hydrometeorology is to find a probability distribution that is able to model well low, moderate and heavy rainfall To address this issue, we present a semiparametric distribution suitable for modeling the entire-range of rainfall amount This model is based on a recent parametric statistical model called the class of Extended Generalized Pareto Distributions (EGPD) The EGPD family is in compliance with Extreme Value Theory for both small and large values, while it keeps a smooth transition between these tails and bypasses the hurdle of selecting thresholds to define extremes In particular, return levels beyond the largest observation can be inferred To add flexibility to this EGPD class, we propose to model the transition function in a non-parametric fashion A fast and efficient nonparametric scheme based on Bernstein polynomial approximations is investigated We perform simulation studies to assess the performance of our approach It is compared to two parametric models: a parametric EGPD and the classical Generalized Pareto Distribution (GPD), the later being only fitted to excesses above high threshold We also apply our semiparametric version of EGPD to daily rainfall data recorded at Mont-Aigoual weather station in France

Posted Content
TL;DR: DoSE, the density of states estimator, is proposed and its state-of-the-art performance against other unsupervised OOD detectors on previously established ``hard'' benchmarks is demonstrated.
Abstract: Perhaps surprisingly, recent studies have shown probabilistic model likelihoods have poor specificity for out-of-distribution (OOD) detection and often assign higher likelihoods to OOD data than in-distribution data. To ameliorate this issue we propose DoSE, the density of states estimator. Drawing on the statistical physics notion of ``density of states,'' the DoSE decision rule avoids direct comparison of model probabilities, and instead utilizes the ``probability of the model probability,'' or indeed the frequency of any reasonable statistic. The frequency is calculated using nonparametric density estimators (e.g., KDE and one-class SVM) which measure the typicality of various model statistics given the training data and from which we can flag test points with low typicality as anomalous. Unlike many other methods, DoSE requires neither labeled data nor OOD examples. DoSE is modular and can be trivially applied to any existing, trained model. We demonstrate DoSE's state-of-the-art performance against other unsupervised OOD detectors on previously established ``hard'' benchmarks.

Journal ArticleDOI
TL;DR: In this paper, the authors generalize the Cramer-von Mises statistic via projection averaging to obtain a robust test for the multivariate two-sample problem, which is consistent against all fixed alternatives, robust to heavy-tailed data and minimax rate optimal against a certain class of alternatives.
Abstract: In this work, we generalize the Cramer–von Mises statistic via projection averaging to obtain a robust test for the multivariate two-sample problem. The proposed test is consistent against all fixed alternatives, robust to heavy-tailed data and minimax rate optimal against a certain class of alternatives. Our test statistic is completely free of tuning parameters and is computationally efficient even in high dimensions. When the dimension tends to infinity, the proposed test is shown to have comparable power to the existing high-dimensional mean tests under certain location models. As a by-product of our approach, we introduce a new metric called the angular distance which can be thought of as a robust alternative to the Euclidean distance. Using the angular distance, we connect the proposed method to the reproducing kernel Hilbert space approach. In addition to the Cramer–von Mises statistic, we demonstrate that the projection-averaging technique can be used to define robust multivariate tests in many other problems.

Journal ArticleDOI
15 Jul 2020-Energy
TL;DR: The proposed load range discretization method acquires more reliable and sharper load probability distributions, which can be beneficial to various decision-making activities in power systems.

Journal ArticleDOI
TL;DR: While the main focus lies on robust regression estimation, robust bandwidth selection and conditional scale estimation are discussed as well, and popular nonparametric models such as additive and varying‐coefficient models are summarized.
Abstract: Nonparametric regression methods provide an alternative approach to parametric estimation that requires only weak identification assumptions and thus minimizes the risk of model misspecification. In this article, we survey some nonparametric regression techniques, with an emphasis on kernel‐based estimation, that are additionally robust to atypical and outlying observations. While the main focus lies on robust regression estimation, robust bandwidth selection and conditional scale estimation are discussed as well. Robust estimation in popular nonparametric models such as additive and varying‐coefficient models is summarized too. The performance of the main methods is demonstrated on a real dataset.

Journal ArticleDOI
TL;DR: This work investigates the forecasting performance of several models for the 1-day-ahead prediction of demand and prices on four electricity markets (APX Power-UK, Nord Pool, PJM and IPEX) with particular emphasis on the functional approach, that models the whole daily profile as a single functional observation.
Abstract: Efficient modeling and forecasting of electricity demand and prices is an important issue in competitive electricity markets. This work investigates the forecasting performance of several models for the 1-day-ahead prediction of demand and prices on four electricity markets (APX Power-UK, Nord Pool, PJM and IPEX). All the models are based on two steps: a nonparametric estimation of some deterministic components, followed by the choice of a suitable model for the residual stochastic component. This latter step includes univariate and multivariate as well as parametric and nonparametric models, with particular emphasis on the functional approach, that models the whole daily profile as a single functional observation. More specifically, the models involved are: a linear model and a nonlinear (nonparametric) autoregressive model, a vector autoregressive model and four autoregressive functional specifications. Prediction covers a whole year. Comparisons are based both on descriptive statistics and on statistical tests of equal forecasting accuracy. Though results partly depend on specific markets, a double functional model always proved to be the best- or no different from the best-model, highlighting the effectiveness of the functional approach.

Journal ArticleDOI
TL;DR: A polygenic risk prediction method that does not require explicitly modeling any underlying genetic architecture to be predicted and improves the identification of groups at high risk for breast cancer, type 2 diabetes, inflammatory bowel disease, and coronary heart disease, all of which have available early intervention or prevention treatments.
Abstract: In complex trait genetics, the ability to predict phenotype from genotype is the ultimate measure of our understanding of genetic architecture underlying the heritability of a trait. A complete understanding of the genetic basis of a trait should allow for predictive methods with accuracies approaching the trait's heritability. The highly polygenic nature of quantitative traits and most common phenotypes has motivated the development of statistical strategies focused on combining myriad individually non-significant genetic effects. Now that predictive accuracies are improving, there is a growing interest in the practical utility of such methods for predicting risk of common diseases responsive to early therapeutic intervention. However, existing methods require individual-level genotypes or depend on accurately specifying the genetic architecture underlying each disease to be predicted. Here, we propose a polygenic risk prediction method that does not require explicitly modeling any underlying genetic architecture. We start with summary statistics in the form of SNP effect sizes from a large GWAS cohort. We then remove the correlation structure across summary statistics arising due to linkage disequilibrium and apply a piecewise linear interpolation on conditional mean effects. In both simulated and real datasets, this new non-parametric shrinkage (NPS) method can reliably allow for linkage disequilibrium in summary statistics of 5 million dense genome-wide markers and consistently improves prediction accuracy. We show that NPS improves the identification of groups at high risk for breast cancer, type 2 diabetes, inflammatory bowel disease, and coronary heart disease, all of which have available early intervention or prevention treatments.

Journal ArticleDOI
TL;DR: A data-driven nonparametric chance-constrained optimization for microgrid energy management that imposes no assumption on probability density and distribution functions of solar generation and load.
Abstract: In this article, we present a data-driven nonparametric chance-constrained optimization for microgrid energy management. The proposed approach imposes no assumption on probability density and distribution functions of solar generation and load. Adaptive kernel density estimator is utilized to construct a confidence set for each random parameter based on the historical data. The constructed confidence sets encompass the ambiguous true distribution and density functions. The concept of $\phi $ -divergence tolerance is applied to compute the distance between the estimated and true probability distribution functions (PDF)s. The estimated distributions are used to formulate a set of data-driven nonparametric chance constraints and model system/component restrictions. To account for the impact of errors in the forecast distributions on system economics and security, confidence levels of the chance constraints are adjusted with respect to pointwise errors of the estimated PDFs. This adjustment ensures that the microgrid chance constraints are satisfied with a predetermined confidence level even if the true realizations of solar generation and load do not exactly fit on the estimated PDFs. The chance constraints are converted into algebraic constraints. Numerical results show the effectiveness of the proposed approach for microgrid management.

Journal ArticleDOI
23 May 2020
TL;DR: In this paper, a forecasting procedure based on components estimation technique to forecast medium-term electricity consumption is proposed, where the electricity consumption series is divided into two major components: deterministic and stochastic.
Abstract: The increasing shortage of electricity in Pakistan disturbs almost all sectors of its economy. As, for accurate policy formulation, precise and efficient forecasts of electricity consumption are vital, this paper implements a forecasting procedure based on components estimation technique to forecast medium-term electricity consumption. To this end, the electricity consumption series is divided into two major components: deterministic and stochastic. For the estimation of deterministic component, we use parametric and nonparametric models. The stochastic component is modeled by using four different univariate time series models including parametric AutoRegressive (AR), nonparametric AutoRegressive (NPAR), Smooth Transition AutoRegressive (STAR), and Autoregressive Moving Average (ARMA) models. The proposed methodology was applied to Pakistan electricity consumption data ranging from January 1990 to December 2015. To assess one month ahead post-sample forecasting accuracy, three standard error measures, namely Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Root Mean Square Error (RMSE), were calculated. The results show that the proposed component-based estimation procedure is very effective at predicting electricity consumption. Moreover, ARMA models outperform the other models, while NPAR model is competitive. Finally, our forecasting results are comparatively batter then those cited in other works.

Journal ArticleDOI
TL;DR: This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBYNC-ND), where it is permissible to download and share the work provided it is properly cited.
Abstract: Copyright © 2020 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the International Anesthesia Research Society. This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBYNC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal. From the *Department of Anesthesiology, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands; and †Department of Surgery and Perioperative Care, Dell Medical School at the University of Texas at Austin, Austin, Texas. Related Article, see p 1864

Journal ArticleDOI
TL;DR: In this paper, the authors propose an approach to estimate a mixture cure model when covariates are present and the lifetime is subject to random right censoring, which is based on an inversion which allows them to write the survival function as a function of the distribution of the observable variables.
Abstract: In survival analysis it often happens that some subjects under study do not experience the event of interest; they are considered to be “cured.” The population is thus a mixture of two subpopulations, one of cured subjects and one of “susceptible” subjects. We propose a novel approach to estimate a mixture cure model when covariates are present and the lifetime is subject to random right censoring. We work with a parametric model for the cure proportion, while the conditional survival function of the uncured subjects is unspecified. The approach is based on an inversion which allows us to write the survival function as a function of the distribution of the observable variables. This leads to a very general class of models which allows a flexible and rich modeling of the conditional survival function. We show the identifiability of the proposed model as well as the consistency and the asymptotic normality of the model parameters. We also consider in more detail the case where kernel estimators are used for the nonparametric part of the model. The new estimators are compared with the estimators from a Cox mixture cure model via simulations. Finally, we apply the new model on a medical data set.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a nonparametric rank-based method for ordinal and metric data with covariance heterogeneity, based on a quadratic form in multivariate rank effect estimators, and critical values are obtained by bootstrap techniques.
Abstract: Multivariate analysis of variance (MANOVA) is a powerful and versatile method to infer and quantify main and interaction effects in metric multivariate multi-factor data. It is, however, neither robust against change in units nor meaningful for ordinal data. Thus, we propose a novel nonparametric MANOVA. Contrary to existing rank-based procedures, we infer hypotheses formulated in terms of meaningful Mann–Whitney-type effects in lieu of distribution functions. The tests are based on a quadratic form in multivariate rank effect estimators, and critical values are obtained by bootstrap techniques. The newly developed procedures provide asymptotically exact and consistent inference for general models such as the nonparametric Behrens–Fisher problem and multivariate one-, two-, and higher-way crossed layouts. Computer simulations in small samples confirm the reliability of the developed method for ordinal and metric data with covariance heterogeneity. Finally, an analysis of a real data example illustrates the applicability and correct interpretation of the results.