scispace - formally typeset
Search or ask a question

Showing papers on "Nonparametric statistics published in 2011"


Journal ArticleDOI
TL;DR: The basics are discussed and a survey of a complete set of nonparametric procedures developed to perform both pairwise and multiple comparisons, for multi-problem analysis are given.
Abstract: a b s t r a c t The interest in nonparametric statistical analysis has grown recently in the field of computational intelligence. In many experimental studies, the lack of the required properties for a proper application of parametric procedures - independence, normality, and homoscedasticity - yields to nonparametric ones the task of performing a rigorous comparison among algorithms. In this paper, we will discuss the basics and give a survey of a complete set of nonparametric procedures developed to perform both pairwise and multiple comparisons, for multi-problem analysis. The test problems of the CEC'2005 special session on real parameter optimization will help to illustrate the use of the tests throughout this tutorial, analyzing the results of a set of well-known evolutionary and swarm intelligence algorithms. This tutorial is concluded with a compilation of considerations and recommendations, which will guide practitioners when using these tests to contrast their experimental results.

3,832 citations


Journal ArticleDOI
TL;DR: MatchIt implements a wide range of sophisticated matching methods, making it possible to greatly reduce the dependence of causal inferences on hard-to-justify, but commonly made, statistical modeling assumptions.
Abstract: MatchIt implements the suggestions of Ho, Imai, King, and Stuart (2007) for improving parametric statistical models by preprocessing data with nonparametric matching methods. MatchIt implements a wide range of sophisticated matching methods, making it possible to greatly reduce the dependence of causal inferences on hard-to-justify, but commonly made, statistical modeling assumptions. The software also easily fits into existing research practices since, after preprocessing data with MatchIt , researchers can use whatever parametric model they would have used without MatchIt , but produce inferences with substantially more robustness and less sensitivity to modeling assumptions. MatchIt is an R program, and also works seamlessly with Zelig .

3,012 citations


Proceedings ArticleDOI
07 May 2011
TL;DR: This work presents the Aligned Rank Transform (ART) for nonparametric factorial data analysis in HCI, and re-examination of some published HCI results exhibits advantages of the ART.
Abstract: Nonparametric data from multi-factor experiments arise often in human-computer interaction (HCI). Examples may include error counts, Likert responses, and preference tallies. But because multiple factors are involved, common nonparametric tests (e.g., Friedman) are inadequate, as they are unable to examine interaction effects. While some statistical techniques exist to handle such data, these techniques are not widely available and are complex. To address these concerns, we present the Aligned Rank Transform (ART) for nonparametric factorial data analysis in HCI. The ART relies on a preprocessing step that "aligns" data before applying averaged ranks, after which point common ANOVA procedures can be used, making the ART accessible to anyone familiar with the F-test. Unlike most articles on the ART, which only address two factors, we generalize the ART to N factors. We also provide ARTool and ARTweb, desktop and Web-based programs for aligning and ranking data. Our re-examination of some published HCI results exhibits advantages of the ART.

1,620 citations


Reference EntryDOI
16 May 2011
TL;DR: In many natural and physical sciences the measurements are directions, either in two- or three-dimensions as mentioned in this paper, and the Fisher-von Mises distribution is introduced and nonparametric methods such as goodness of fit tests are discussed.
Abstract: In many natural and physical sciences the measurements are directions—either in two- or three-dimensions This chapter briefly introduces this novel area of statistics, and provides a good starting point for further exploration A basic parametric model called the Fisher-von Mises distribution is introduced and nonparametric methods such as goodness of fit tests are discussed Further references are given for exploring related topics such as correlation and regression Keywords: Directional data; rotational invariance; circular and spherical models; Langevin; Fisher-von Mises distribution; angular correlation; circular regression; nonparametric methods

1,213 citations


Journal ArticleDOI
TL;DR: In this article, a variable screening procedure via correlation learning was proposed to reduce dimensionality in sparse ultra-high-dimensional models, and the extent to which the dimensionality can be reduced by independence screening is quantified.
Abstract: A variable screening procedure via correlation learning was proposed by Fan and Lv (2008) to reduce dimensionality in sparse ultra-high-dimensional models. Even when the true model is linear, the marginal regression can be highly nonlinear. To address this issue, we further extend the correlation learning to marginal nonparametric learning. Our nonparametric independence screening (NIS) is a specific type of sure independence screening. We propose several closely related variable screening procedures. We show that with general nonparametric models, under some mild technical conditions, the proposed independence screening methods have a sure screening property. The extent to which the dimensionality can be reduced by independence screening is also explicitly quantified. As a methodological extension, we also propose a data-driven thresholding and an iterative nonparametric independence screening (INIS) method to enhance the finite- sample performance for fitting sparse additive models. The simulation resul...

535 citations


Book
05 Aug 2011
TL;DR: The purpose of these lecture notes is to provide an introduction to the general theory of empirical risk minimization with an emphasis on excess risk bounds and oracle inequalities in penalized problems.
Abstract: The purpose of these lecture notes is to provide an introduction to the general theory of empirical risk minimization with an emphasis on excess risk bounds and oracle inequalities in penalized problems. In recent years, there have been new developments in this area motivated by the study of new classes of methods in machine learning such as large margin classification methods (boosting, kernel machines). The main probabilistic tools involved in the analysis of these problems are concentration and deviation inequalities by Talagrand along with other methods of empirical processes theory (symmetrization inequalities, contraction inequality for Rademacher sums, entropy and generic chaining bounds). Sparse recovery based on l_1-type penalization and low rank matrix recovery based on the nuclear norm penalization are other active areas of research, where the main problems can be stated in the framework of penalized empirical risk minimization, and concentration inequalities and empirical processes tools have proved to be very useful.

458 citations


Book ChapterDOI
01 Jan 2011
TL;DR: The phenomenon of self-organized criticality (SOC) can be identified from many observations in the universe, by sampling statistical distributions of physical parameters, such as the distributions of time scales, spatial scales, or energies, for a set of events.
Abstract: The phenomenon of self-organized criticality (SOC) can be identified from many observations in the universe, by sampling statistical distributions of physical parameters, such as the distributions of time scales, spatial scales, or energies, for a set of events. SOC manifests itself in the statistics of nonlinear processes.

382 citations


Posted Content
TL;DR: In this article, a nonparametric estimation of an instrumental regression function f defined by conditional moment restrictions that stem from a structural econometric model E[Y − f (Z) | W] = 0, and involve endogenous variables Y and Z and instruments W.
Abstract: The focus of this paper is the nonparametric estimation of an instrumental regression function f defined by conditional moment restrictions that stem from a structural econometric model E[Y − f (Z) | W] = 0, and involve endogenous variables Y and Z and instruments W. The function f is the solution of an ill-posed inverse problem and we propose an estimation procedure based on Tikhonov regularization. The paper analyzes identification and overidentification of this model, and presents asymptotic properties of the estimated nonparametric instrumental regression function.

330 citations


Journal ArticleDOI
TL;DR: In this article, the authors explore the robustness of tests for choice inconsistencies based on parameter restrictions in parametric models, focusing on tests proposed by Ketcham, Kuminoff, and Powers.
Abstract: We explore the in- and out-of-sample robustness of tests for choice inconsistencies based on parameter restrictions in parametric models, focusing on tests proposed by Ketcham, Kuminoff, and Powers (2016). We argue that their nonparametric alternatives are inherently conservative with respect to detecting mistakes. We then show that our parametric model is robust to KKP's suggested specification checks, and that comprehensive goodness of fit measures perform better with our model than the expected utility model.Finally, we explore the robustness of our 2011 results to alternative normative assumptions highlighting the role of brand fixed effects and unobservable characteristics.

318 citations


Book ChapterDOI
01 Jan 2011
TL;DR: In this paper, some aspects of the estimation of the density function of a univariate probability distribution are discussed, and the asymptotic mean square error of a particular class of estimates is evaluated.
Abstract: This note discusses some aspects of the estimation of the density function of a univariate probability distribution. All estimates of the density function satisfying relatively mild conditions are shown to be biased. The asymptotic mean square error of a particular class of estimates is evaluated.

304 citations


Journal ArticleDOI
TL;DR: A revision of R’s ks.test() function and a new cvm.test(), function are offered that fill this need in the R language for two of the most popular nonparametric goodness-of-fit tests.
Abstract: Methodology extending nonparametric goodness-of-fit tests to discrete null distributions has existed for several decades. However, modern statistical software has generally failed to provide this methodology to users. We offer a revision of R’s ks.test() function and a new cvm.test() function that fill this need in the R language for two of the most popular nonparametric goodness-of-fit tests. This paper describes these contributions and provides examples of their usage. Particular attention is given to various numerical issues that arise in their implementation.

Journal ArticleDOI
TL;DR: This work proposes adaptive penalization methods for variable selection in the semiparametric varying-coefficient partially linear model and proves that the methods possess the oracle property.
Abstract: The complexity of semiparametric models poses new challenges to statistical inference and model selection that frequently arise from real applications In this work, we propose new estimation and variable selection procedures for the semiparametric varying-coefficient partially linear model We first study quantile regression estimates for the nonparametric varying-coefficient functions and the parametric regression coefficients To achieve nice efficiency properties, we further develop a semiparametric composite quantile regression procedure We establish the asymptotic normality of proposed estimators for both the parametric and nonparametric parts and show that the estimators achieve the best convergence rate Moreover, we show that the proposed method is much more efficient than the least-squares-based method for many non-normal errors and that it only loses a small amount of efficiency for normal errors In addition, it is shown that the loss in efficiency is at most 111% for estimating varying coefficient functions and is no greater than 136% for estimating parametric components To achieve sparsity with high-dimensional covariates, we propose adaptive penalization methods for variable selection in the semiparametric varying-coefficient partially linear model and prove that the methods possess the oracle property Extensive Monte Carlo simulation studies are conducted to examine the finite-sample performance of the proposed procedures Finally, we apply the new methods to analyze the plasma beta-carotene level data

Journal ArticleDOI
TL;DR: In this article, a nonparametric estimation of an instrumental regression function f defined by conditional moment restrictions that stem from a structural econometric model E[Y − f (Z) | W] = 0, and involve endogenous variables Y and Z and instruments W.
Abstract: The focus of this paper is the nonparametric estimation of an instrumental regression function f defined by conditional moment restrictions that stem from a structural econometric model E[Y − f (Z) | W] = 0, and involve endogenous variables Y and Z and instruments W. The function f is the solution of an ill-posed inverse problem and we propose an estimation procedure based on Tikhonov regularization. The paper analyzes identification and overidentification of this model, and presents asymptotic properties of the estimated nonparametric instrumental regression function.

Journal ArticleDOI
TL;DR: A SAS(®) macro implementation of a multiple comparison test based on significant Kruskal-Wallis results from the SAS NPAR1WAY procedure, designed for up to 20 groups at a user-specified alpha significance level.

Journal ArticleDOI
TL;DR: A novel Bayesian paradigm for the identification of output error models is applied to the design of optimal predictors and discrete-time models based on prediction error minimization by interpreting the predictor impulse responses as realizations of Gaussian processes.

Journal ArticleDOI
TL;DR: In this article, a nonparametric test for nonlinear causality up to the K th conditional moment was proposed, where the conditional mean of a series is not the only variable, but also the dependence between series may be nonlinear, and/or not only through conditional mean.

Book
25 Jan 2011
TL;DR: In this article, a unified presentation for functional regression modelling is presented, including functional linear regression, kernel regression estimation, and linear processes for functional data, and functional principal component analysis.
Abstract: List of illustrations List of datasets PART I: REGRESSION MODELLING FOR FDA 1. Unifying presentation for functional regression modelling 2. Functional linear regression 3. Linear processes for functional data 4. Kernel regression estimation for functional data 5. Nonparametric methods for alpha-mixing functional data 6. Functional coefficient models for economics and financial data PART II: BENCHMARK METHODS FOR FDA 7. Resampling methods for functional data 8. Functional principal component analysis 9. Curve registration 10. Classification methods for functional data 11. Sparse functional data analysis PART III: TOWARDS STOCHASTIC BACKGROUND IN INFINITE-DIMENSIONAL SPACES 12. Vector integration in Banach spaces 13. Operator geometry in Statistics 14. On Bernstein type and maximal inequalities for dependent Banach-valued random vectors and applications 15. On spectral and random measures associated to a stationary process 16. An invitation to operator-based Statistics Index

Journal ArticleDOI
TL;DR: This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian nonparametric and semiparametric models in R, DPpackage.
Abstract: Data analysis sometimes requires the relaxation of parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of the probability model. In the Bayesian context, this is accomplished by placing a prior distribution on a function space, such as the space of all probability distributions or the space of all regression functions. Unfortunately, posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian nonparametric and semiparametric models in R, DPpackage . Currently, DPpackage includes models for marginal and conditional density estimation, receiver operating characteristic curve analysis, interval-censored data, binary regression data, item response data, longitudinal and clustered data using generalized linear mixed models, and regression data using generalized additive models. The package also contains functions to compute pseudo-Bayes factors for model comparison and for eliciting the precision parameter of the Dirichlet process prior, and a general purpose Metropolis sampling algorithm. To maximize computational efficiency, the actual sampling for each model is carried out using compiled C, C++ or Fortran code.

Journal ArticleDOI
TL;DR: Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses.
Abstract: Summary Background—Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. Objectives—The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Methods—Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Results—Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Conclusions—Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications.

Journal ArticleDOI
TL;DR: In this paper, the authors explore what can be learned when the function of interest is identified through an instrumental variable but is not assumed to be known up to finitely many parameters.
Abstract: Instrumental variables are widely used in applied econometrics to achieve identification and carry out estimation and inference in models that contain endogenous explanatory variables. In most applications, the function of interest (e.g., an Engel curve or demand function) is assumed to be known up to finitely many parameters (e.g., a linear model), and instrumental variables are used identify and estimate these parameters. However, linear and other finite-dimensional parametric models make strong assumptions about the population being modeled that are rarely if ever justified by economic theory or other a priori reasoning and can lead to seriously erroneous conclusions if they are incorrect. This paper explores what can be learned when the function of interest is identified through an instrumental variable but is not assumed to be known up to finitely many parameters. The paper explains the differences between parametric and nonparametric estimators that are important for applied research, describes an easily implemented nonparametric instrumental variables estimator, and presents empirical examples in which nonparametric methods lead to substantive conclusions that are quite different from those obtained using standard, parametric estimators.

Journal ArticleDOI
TL;DR: In this article, a class of penalized sieve minimum distance (PSMD) estimators are proposed, which are minimizers of a penalized empirical minimum distance criterion over a collection of sieve spaces that are dense in the infinite dimensional function parameter space.
Abstract: This paper studies nonparametric estimation of conditional moment restrictions in which the generalized residual functions can be nonsmooth in the unknown functions of endogenous variables. This is a nonparametric nonlinear instrumental variables (IV) problem. We propose a class of penalized sieve minimum distance (PSMD) estimators, which are minimizers of a penalized empirical minimum distance criterion over a collection of sieve spaces that are dense in the infinite dimensional function parameter space. Some of the PSMD procedures use slowly growing finite dimensional sieves with flexible penalties or without any penalty; others use large dimensional sieves with lower semicompact and/or convex penalties. We establish their consistency and the convergence rates in Banach space norms (such as a sup-norm or a root mean squared norm), allowing for possibly non-compact infinite dimensional parameter spaces. For both mildly and severely ill-posed nonlinear inverse problems, our convergence rates in Hilbert space norms (such as a root mean squared norm) achieve the known minimax optimal rate for the nonparametric mean IV regression. We illustrate the theory with a nonparametric additive quantile IV regression. We present a simulation study and an empirical application of estimating nonparametric quantile IV Engel curves.

Journal ArticleDOI
TL;DR: In this paper, a two-stage estimation procedure is proposed to estimate the link function for the single index and the parameters in the single indices, as well as the linear component of the model, and asymptotic normality is established for both parametric components.
Abstract: In this paper, we study the estimation for a partial-linear single-index model. A two-stage estimation procedure is proposed to estimate the link function for the single index and the parameters in the single index, as well as the parameters in the linear component of the model. Asymptotic normality is established for both parametric components. For the index, a constrained estimating equation leads to an asymptotically more efficient estimator than existing estimators in the sense that it is of a smaller limiting variance. The estimator of the nonparametric link function achieves optimal convergence rates, and the structural error variance is obtained. In addition, the results facilitate the construction of confidence regions and hypothesis testing for the unknown parameters. A simulation study is performed and an application to a real dataset is illustrated. The extension to multiple indices is briefly sketched.

Journal ArticleDOI
Xavier D'Haultfoeuille1
TL;DR: In this paper, a nonparametric model between the two variables with additive separability and a large support condition is considered, and different versions of completeness are obtained, depending on which regularity conditions are imposed.
Abstract: The notion of completeness between two random elements has been considered recently to provide identification in nonparametric instrumental problems. This condition is quite abstract, however, and characterizations have been obtained only in special cases. This paper considers a nonparametric model between the two variables with an additive separability and a large support condition. In this framework, different versions of completeness are obtained, depending on which regularity conditions are imposed. This result allows one to establish identification in an instrumental nonparametric regression with limited endogenous regressor, a case where the control variate approach breaks down.

Journal ArticleDOI
TL;DR: In this paper, a nonparametric kernel density estimation method for wind speed probability distribution is proposed, which is more accurate and has better adaptability than any conventional parametric distribution.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a more robust modeling approach by considering the model for the nonresponding part as an exponential tilting of the model of the responding part, which can be justified under the assumption that the response probability can be expressed as a semiparametric logistic regression model.
Abstract: Parameter estimation with nonignorable missing data is a challenging problem in statistics. The fully parametric approach for joint modeling of the response model and the population model can produce results that are quite sensitive to the failure of the assumed model. We propose a more robust modeling approach by considering the model for the nonresponding part as an exponential tilting of the model for the responding part. The exponential tilting model can be justified under the assumption that the response probability can be expressed as a semiparametric logistic regression model. In this paper, based on the exponential tilting model, we propose a semiparametric estimation method of mean functionals with nonignorable missing data. A semiparametric logistic regression model is assumed for the response probability and a nonparametric regression approach for missing data discussed in Cheng (1994) is used in the estimator. By adopting nonparametric components for the model, the estimation method can be mad...

Journal ArticleDOI
TL;DR: A new multivariate SPC methodology for monitoring location parameters is developed based on adapting a powerful multivariate sign test to online sequential monitoring, which results in a nonparametric counterpart of the classical multivariate EWMA (MEWMA).
Abstract: Nonparametric control charts are useful in statistical process control (SPC) when there is a lack of or limited knowledge about the underlying process distribution, especially when the process measurement is multivariate. This article develops a new multivariate SPC methodology for monitoring location parameters. It is based on adapting a powerful multivariate sign test to online sequential monitoring. The weighted version of the sign test is used to formulate the charting statistic by incorporating the exponentially weighted moving average control (EWMA) scheme, which results in a nonparametric counterpart of the classical multivariate EWMA (MEWMA). It is affine-invariant and has a strictly distribution-free property over a broad class of population models. That is, the in-control (IC) run length distribution can attain (or is always very close to) the nominal one when using the same control limit designed for a multivariate normal distribution. Moreover, when the process distribution comes from the elli...

Book
25 Feb 2011
TL;DR: This book describes data testing Hypotheses Examining Relationships and discusses how to Obtain a One-Sample T Test Options: Confidence Level and Missing Data Exercises.
Abstract: PART 1. GETTING STARTED WITH IBM SPSS STATISTICS 1. Introduction About This Book Getting Started with IBM SPSS Statistics Describing Data Testing Hypotheses Examining Relationships Lets Get Started 2. An Introductory Tour of IBM SPSS Statistics Starting IBM SPSS Statistics Help Is Always at Hand Copying the Data Files Opening a Data File Statistical Procedures The Viewer Window Viewer Objects The Data Editor Window Entering Non-Numeric Data Clearing the Data Editor without Saving Changes The IBM SPSS Statistics Online Tutorial The IBM SPSS Statistics Toolbar The IBM SPSS Statistics Help System Contextual Help What's Next? 3. Sources of Data Know Your Data Survey Data Asking the Question Measuring Time Selecting Participants Selecting a Sample General Social Survey Random-Digit Dialing Internet Surveys Designing Experiments Random Assignment Minimizing Bias Summary What's Next? Exercises PART 2. DESCRIBING DATA 4. Counting Responses Describing Variables A Simple Frequency Table Sorting Frequency Tables Pie Charts Bar Charts Summarizing Internet Time Histograms Mode and Median Percentiles Summary What's Next? How to Obtain a Frequency Table Format: Appearance of the Frequency Table Statistics: Univariate Statistics Charts: Bar Charts, Pie Charts, and Histograms Exercises 5. Computing Descriptive Statistics Summarizing Data Scales of Measurement Mode, Median, and Arithmetic Average Comparing Mean and Median Summarizing Time Spent Online Measures of Variability Range Variance and Standard Deviation The Coefficient of Variation Standard Scores Summary What's Next? How to Obtain Univariate Descriptive Statistics Options: Choosing Statistics and Sorting Variables Exercises 6. Comparing Groups Age, Education, and Internet Use Plotting Means Layers: Defining Subgroups by More than One Variable Summary What's Next? How to Obtain Subgroup Means Layers: Defining Subgroups by More than One Variable Options: Additional Statistics and Display of Labels Exercises 7. Looking at Distributions Marathon Completion Times Age and Gender Marathon Times for Mature Runners Summary What's Next? How to Explore Distributions Explore Statistics Graphical Displays Options Exercises 8. Counting Responses for Combinations of Variables Library Use and Education Row and Column Percentages Bar Charts Adding Control Variables Library Use and the Internet Summary What's Next? How to Obtain a Crosstabulation Layers: Three or More Variables at Once Cells: Percentages, Expected Counts, and Residuals Bivariate Statistics Format: Adjusting the Table Format Exercises 9. Plotting Data Examining Population Indicators Simple Scatterplots Scatterplot Matrices Overlay Plots Three-Dimensional Plots Identifying Unusual Points Rotating 3-D Scatterplots Summary What's Next? How to Obtain a Scatterplot Obtaining a Simple Scatterplot Obtaining an Overlay Scatterplot Obtaining a Scatterplot Matrix Obtaining a 3-D Scatterplot Editing a Scatterplot Exercises PART 3. TESTING HYPOTHESES 10. Evaluating Results from Samples From Sample to Population A Computer Model The Effect of Sample Size The Binomial Test Summary What's Next? Exercises 11. The Normal Distribution The Normal Distribution Samples from a Normal Distribution Means from a Normal Population Are the Sample Results Unlikely? Testing a Hypothesis Means from Non-Normal Distributions Means from a Uniform Distribution Summary What's Next? Exercises 12. Testing a Hypothesis about a Single Mean Examining the Data The T Distribution Calculating the T Statistic Confidence Intervals Other Confidence Levels Confidence Interval for a Difference Confidence Intervals and Hypothesis Tests Null Hypotheses and Alternative Hypotheses Rejecting the Null Hypothesis Summary What's Next? How to Obtain a One-Sample T Test Options: Confidence Level and Missing Data Exercises 13. Testing a Hypothesis about Two Related Means Marathon Runners in Paired Designs Looking at Differences Is the Mean Difference Zero? Two Approaches The Paired-Samples T Test Are You Positive? Some Possible Problems Examining Normality Summary What's Next? How to Obtain a Paired-Samples T Test Options: Confidence Level and Missing Data Exercises 14. Testing a Hypothesis about Two Independent Means Examining Television Viewing Distribution of Differences Standard Error of the Mean Difference Computing the T Statistic Output from the Two-Independent-Samples T Test Confidence Intervals for the Mean Difference Testing the Equality of Variances Effect of Outliers Introducing Education Can You Prove the Null Hypothesis? Interpreting the Observed Significance Level Power Monitoring Death Rates Does Significant Mean Important? Summary What's Next? How to Obtain an Independent-Samples T Test Define Groups: Specifying the Subgroups Options: Confidence Level and Missing Data Exercises 15. One-Way Analysis of Variance Hours in a Work Week Describing the Data Confidence Intervals for the Group Means Testing the Null Hypothesis Assumptions Needed for Analysis of Variance Analyzing the Variability Comparing the Two Estimates of Variability The Analysis-of-Variance Table Multiple Comparison Procedures Television Viewing, Education, and Internet Use Summary What's Next? How to Obtain a One-Way Analysis of Variance Post Hoc Multiple Comparisons: Finding the Difference Options: Statistics and Missing Data Exercises 16. Two-Way Analysis of Variance The Design Examining the Data Testing Hypotheses Degree and Gender Interaction Necessary Assumptions Analysis-of-Variance Table Testing the Degree-by-Gender Interaction Testing the Main Effects Removing the Interaction Effect Where Are the Differences? Multiple Comparison Results Checking Assumptions A Look at Television Extensions Summary What's Next? How to Obtain a GLM Univariate Analysis GLM Univariate: Model GLM Univariate: Plots GLM Univariate: Post Hoc GLM Univariate: Options GLM Univariate: Save Exercises 17. Comparing Observed and Expected Counts Freedom or Manners? Observed and Expected Counts The Chi-Square Statistic A Larger Table Does College Open Doors? A One-Sample Chi-Square Test Power Concerns Summary What's Next? Exercises 18. Nonparametric Tests Nonparametric Tests for Paired Data Sign Test Wilcoxon Test Who's Sending E-mail? Mann-Whitney Test Kruskal-Wallis Test Friedman Test Summary How to Obtain Nonparametric Tests Chi-Square Test Binomial Test Two-Independent-Samples Tests Several-Independent-Samples Tests Two-Related-Samples Tests Several-Related-Samples Tests Options: Descriptive Statistics and Missing Values Exercises PART 4. EXAMINING RELATIONSHIPS 19. Measuring Association Components of the Justice System Proportional Reduction in Error Measures of Association for Ordinal Variables Concordant and Discordant Pairs Measures Based on Concordant and Discordant Pairs Evaluating the Components Measuring Agreement Correlation-Based Measures Measures Based on the Chi-Square Statistic Summary What's Next? Exercises 20. Linear Regression and Correlation Life Expectancy and Birthrate Choosing the Best Line Calculating the Least-Squares Line Calculating Predicted Values and Residuals Determining How Well the Line Fits Explaining Variability Some Warnings Summary What's Next? How to Obtain a Linear Regression Statistics: Further Information on the Model Residual Plots: Basic Residual Analysis Linear Regression Save: Creating New Variables Linear Regression Options Exercises 21. Testing Regression Hypotheses The Population Regression Line Assumptions Needed for Testing Hypotheses Testing Hypotheses Testing that the Slope Is Zero Confidence Intervals for the Slope and Intercept Predicting Life Expectancy Predicting Means and Individual Observations Standard Error of the Predicted Mean Confidence Intervals for the Predicted Means Prediction Intervals for Individual Cases Summary What's Next? How to Obtain a Bivariate Correlation Options: Additional Statistics and Missing Data How to Obtain a Partial Correlation Options: Additional Statistics and Missing Data Exercises 22. Analyzing Residuals Residuals Standardized Residuals Studentized Residuals Checking for Normality Checking for Constant Variance Checking Linearity Checking Independence A Final Comment on Assumptions Looking for Influential Points Studentized Deleted Residuals Summary What's Next? Exercises 23. Building Multiple Regression Models Predicting Life Expectancy The Model Assumptions for Multiple Regression Examining the Variables Looking at How Well the Model Fits Examining the Coefficients Interpreting the Partial Regression Coefficients Changing the Model Partial Correlation Coefficients Tolerance and Multicollinearity Beta Coefficients Building a Regression Model Methods for Selecting Variables Summary What's Next? How to Obtain a Multiple Linear Regression Options: Variable Selection Criteria Exercises 24. Multiple Regression Diagnostics Examining Normality Scatterplots of Residuals Leverage Changes in the Coefficients Cook's Distance Plots against Independent Variables Partial Regression Plot Why Bother? Summary Exercises Appendices A. Obtaining Charts in IBM SPSS Statistics Overview Creating Bar Charts Creating a Bar Chart for Single Variable Creating a Clustered Bar Chart Creating a Chart with Multiple Variables Modifying Charts Collapsing Pie Chart Slices Changing the Scale of Histogram Saving Chart Files B. Transforming and Selecting Data Data Transformations Transformations at a Glance Saving Changes Delaying Processing of Transformations Recoding Values Computing Variables The Calculator Pad Automatic Recoding Conditional Transformations Case Selection Temporary or Permanent Selection Other Selection Methods C. The T Distribution D. Areas under the Normal Curve E. Descriptions of Data Files F. Answers to Selected Exercises

Journal ArticleDOI
TL;DR: In this paper, the authors provide a probabilistic framework in which these methods are shown to be valid for statistics comprised of functions of DEA or FDH estimators and examine a simple, data-based rule for selecting m suggested by Politis et al.
Abstract: It is well-known that the naive bootstrap yields inconsistent inference in the context of data envelopment analysis (DEA) or free disposal hull (FDH) estimators in nonparametric frontier models. For inference about efficiency of a single, fixed point, drawing bootstrap pseudo-samples of size m < n provides consistent inference, although coverages are quite sensitive to the choice of subsample size m. We provide a probabilistic framework in which these methods are shown to valid for statistics comprised of functions of DEA or FDH estimators. We examine a simple, data-based rule for selecting m suggested by Politis et al. (Stat Sin 11:1105–1124, 2001), and provide Monte Carlo evidence on the size and power of our tests. Our methods (i) allow for heterogeneity in the inefficiency process, and unlike previous methods, (ii) do not require multivariate kernel smoothing, and (iii) avoid the need for solutions of intermediate linear programs.

Journal ArticleDOI
TL;DR: This work proposes a new class of nonparametric adaptive data-driven policies for stochastic inventory control problems on the distribution-free newsvendor model with censored demands and obtains new results on the asymptotic consistency of the Kaplan-Meier estimator for discrete random variables that extend existing work in statistics.
Abstract: Using the well-known product-limit form of the Kaplan-Meier estimator from statistics, we propose a new class of nonparametric adaptive data-driven policies for stochastic inventory control problems. We focus on the distribution-free newsvendor model with censored demands. The assumption is that the demand distribution is not known and there are only sales data available. We study the theoretical performance of the new policies and show that for discrete demand distributions they converge almost surely to the set of optimal solutions. Computational experiments suggest that the new policies converge for general demand distributions, not necessarily discrete, and demonstrate that they are significantly more robust than previously known policies. As a by-product of the theoretical analysis, we obtain new results on the asymptotic consistency of the Kaplan-Meier estimator for discrete random variables that extend existing work in statistics. To the best of our knowledge, this is the first application of the Kaplan-Meier estimator within an adaptive optimization algorithm, in particular, the first application to stochastic inventory control models. We believe that this work will lead to additional applications in other domains.

Journal ArticleDOI
TL;DR: This work establishes inequalities that describe how close approximate pinball risk minimizers are to the corresponding conditional quantile, and uses them to establish an oracle inequality for support vector machines that use the pinball loss.
Abstract: The so-called pinball loss for estimating conditional quantiles is a well-known tool in both statistics and machine learning. So far, however, only little work has been done to quantify the efficiency of this tool for nonparametric approaches. We fill this gap by establishing inequalities that describe how close approximate pinball risk minimizers are to the corresponding conditional quantile. These inequalities, which hold under mild assumptions on the data-generating distribution, are then used to establish so-called variance bounds, which recently turned out to play an important role in the statistical analysis of (regularized) empirical risk minimization approaches. Finally, we use both types of inequalities to establish an oracle inequality for support vector machines that use the pinball loss. The resulting learning rates are min--max optimal under some standard regularity assumptions on the conditional quantile.