scispace - formally typeset
Search or ask a question

Showing papers on "Nonparametric statistics published in 2004"


Book
22 Mar 2004
TL;DR: In this paper, the authors proposed a nonparametric density estimator based on Histogram and Nonparametric Density Estimation (NDE), and generalized additive models and generalized partial linear models.
Abstract: Introduction.- Histogram.- Nonparametric Density Estimation.- Nonparametric Regression.- Semiparametric and Generalized Regression Models.- Single Index Models.- Generalized Partial Linear Models.- Additive Models and Marginal Effects.- Generalized Additive Models.

789 citations


Journal ArticleDOI
TL;DR: The authors describe a method and provide a simple worked example using inverse probability weights (IPW) to create adjusted survival curves when the weights are non-parametrically estimated, equivalent to direct standardization of the survival curves to the combined study population.

662 citations


Journal ArticleDOI
TL;DR: A method for nonparametric regression which admits continuous and categorical data in a natural manner using the method of kernels is proposed, and the asymptotic normality of the estimator is established.

640 citations


Journal ArticleDOI
21 Oct 2004-Nature
TL;DR: It is shown that maximum likelihood and BMCMC can become strongly biased and statistically inconsistent when the rates at which sequence sites evolve change non-identically over time.
Abstract: All inferences in comparative biology depend on accurate estimates of evolutionary relationships. Recent phylogenetic analyses have turned away from maximum parsimony towards the probabilistic techniques of maximum likelihood and bayesian Markov chain Monte Carlo (BMCMC). These probabilistic techniques represent a parametric approach to statistical phylogenetics, because their criterion for evaluating a topology--the probability of the data, given the tree--is calculated with reference to an explicit evolutionary model from which the data are assumed to be identically distributed. Maximum parsimony can be considered nonparametric, because trees are evaluated on the basis of a general metric--the minimum number of character state changes required to generate the data on a given tree--without assuming a specific distribution. The shift to parametric methods was spurred, in large part, by studies showing that although both approaches perform well most of the time, maximum parsimony is strongly biased towards recovering an incorrect tree under certain combinations of branch lengths, whereas maximum likelihood is not. All these evaluations simulated sequences by a largely homogeneous evolutionary process in which data are identically distributed. There is ample evidence, however, that real-world gene sequences evolve heterogeneously and are not identically distributed. Here we show that maximum likelihood and BMCMC can become strongly biased and statistically inconsistent when the rates at which sequence sites evolve change non-identically over time. Maximum parsimony performs substantially better than current parametric methods over a wide range of conditions tested, including moderate heterogeneity and phylogenetic problems not normally considered difficult.

574 citations


Posted Content
TL;DR: In this paper, a class of pseudo maximum likelihood (PML) estimators is proposed to deal with the indeterminacy problem associated with the existence of multiple equilibria and the computational burden in the solution of the game.
Abstract: This paper studies the estimation of dynamic discrete games of incomplete information. Two main econometric issues appear in the estimation of these models: the indeterminacy problem associated with the existence of multiple equilibria, and the computational burden in the solution of the game. We propose a class of pseudo maximum likelihood (PML) estimators that deals with these problems and we study the asymptotic and finite sample properties of several estimators in this class. We first focus on two-step PML estimators which, though attractive for their computational simplicity, have some important limitations: they are seriously biased in small samples; they require consistent nonparametric estimators of players' choice probabilities in the first step, which are not always feasible for some models and data; and they are asymptotically inefficient. Second, we show that a recursive extension of the two-step PML, which we call nested pseudo likelihood (NPL), addresses those drawbacks at a relatively small additional computational cost. The NPL estimator is particularly useful in applications where consistent nonparametric estimates of choice probabilities are either not available or very imprecise, e.g., models with permanent unobserved heterogeneity. Finally, we illustrate these methods in Montecarlo experiments and in an empirical application to a model of firm entry and exit in oligopoly markets using Chilean data from several retail industries.

571 citations


Journal ArticleDOI
TL;DR: This article shows that cross-validation produces asymptotically optimal smoothing for relevant components, while eliminating irrelevant components by oversmoothing in the problem of nonparametric estimation of a conditional density.
Abstract: Many practical problems, especially some connected with forecasting, require nonparametric estimation of conditional densities from mixed data. For example, given an explanatory data vector X for a prospective customer, with components that could include the customer's salary, occupation, age, sex, marital status, and address, a company might wish to estimate the density of the expenditure, Y, that could be made by that person, basing the inference on observations of (X, Y) for previous clients. Choosing appropriate smoothing parameters for this problem can be tricky, not in the least because plug-in rules take a particularly complex form in the case of mixed data. An obvious difficulty is that there exists no general formula for the optimal smoothing parameters. More insidiously, and more seriously, it can be difficult to determine which components of X are relevant to the problem of conditional inference. For example, if the jth component of X is independent of Y, then that component is irrelevant to es...

534 citations


Journal ArticleDOI
TL;DR: A Rao–Blackwell type of relation is derived in which nonparametric methods such as cross-validation are seen to be randomized versions of their covariance penalty counterparts.
Abstract: Having constructed a data-based estimation rule, perhaps a logistic regression or a classification tree, the statistician would like to know its performance as a predictor of future cases. There are two main theories concerning prediction error: (1) penalty methods such as Cp, Akaike's information criterion, and Stein's unbiased risk estimate that depend on the covariance between data points and their corresponding predictions; and (2) cross-validation and related nonparametric bootstrap techniques. This article concerns the connection between the two theories. A Rao–Blackwell type of relation is derived in which nonparametric methods such as cross-validation are seen to be randomized versions of their covariance penalty counterparts. The model-based penalty methods offer substantially better accuracy, assuming that the model is believable.

465 citations


01 Jan 2004
TL;DR: In this article, the theoretical properties of cross-validated smoothing parameter selection for local linear kernel estimators are studied. But the authors focus on the local linear estimator and do not consider the nonparametric estimator.
Abstract: Local linear kernel methods have been shown to dominate local constant methods for the nonparametric estimation of regression functions. In this paper we study the theoretical properties of cross-validated smoothing parameter selec- tion for the local linear kernel estimator. We derive the rate of convergence of the cross-validated smoothing parameters to their optimal benchmark values, and we establish the asymptotic normality of the resulting nonparametric estimator. We then generalize our result to the mixed categorical and continuous regressor case which is frequently encountered in applied settings. Monte Carlo simulation results are reported to examine the finite sample performance of the local-linear based cross-validation smoothing parameter selector. We relate the theoretical and simulation results to a corrected AIC method (termed AICc )p roposed by Hur- vich, Simonoff and Tsai (1998) and find that AICc has impressive finite-sample properties.

444 citations


Journal ArticleDOI
TL;DR: The new method provides greater weight to samples near the expected decision boundary, which tends to provide for increased classification accuracy and to reduce the effect of the singularity problem.
Abstract: In this paper, a new nonparametric feature extraction method is proposed for high-dimensional multiclass pattern recognition problems. It is based on a nonparametric extension of scatter matrices. There are at least two advantages to using the proposed nonparametric scatter matrices. First, they are generally of full rank. This provides the ability to specify the number of extracted features desired and to reduce the effect of the singularity problem. This is in contrast to parametric discriminant analysis, which usually only can extract L-1 (number of classes minus one) features. In a real situation, this may not be enough. Second, the nonparametric nature of scatter matrices reduces the effects of outliers and works well even for nonnormal datasets. The new method provides greater weight to samples near the expected decision boundary. This tends to provide for increased classification accuracy.

429 citations


Journal ArticleDOI
TL;DR: This paper illustrates the nonparametric analysis of ordinal data obtained from two-way factorial designs, including a repeated measures design, and shows how to quantify the effects of experimental factors on ratings through estimated relative marginal effects.
Abstract: Plant disease severity often is assessed using an ordinal rating scale rather than a continuous scale of measurement. Although such data usually should be analyzed with nonparametric methods, and not with the typical parametric techniques (such as analysis of variance), limitations in the statistical methodology available had meant that experimental designs generally could not be more complicated than a one-way layout. Very recent advancements in the theoretical formulation of hypotheses and associated test statistics within a nonparametric framework, together with development of software for implementing the methods, have made it possible for plant pathologists to analyze properly ordinal data from more complicated designs using nonparametric techniques. In this paper, we illustrate the nonparametric analysis of ordinal data obtained from two-way factorial designs, including a repeated measures design, and show how to quantify the effects of experimental factors on ratings through estimated rela...

420 citations


Book
03 Sep 2004
TL;DR: This book discusses the development of nonparametric models for system modeling in Physiological Systems, and some of the models developed included the Wiener Series, Volterra Models, and the VWM Model.
Abstract: Prologue. 1 Introduction. 1.1 Purpose of this Book. 1.2 Advocated Approach. 1.3 The Problem of System Modeling in Physiology. 1.4 Types of Nonlinear Models of Physiological Systems. 2 Nonparametric Modeling. 2.1 Volterra Models. 2.2 Wiener Models. 2.3 Efficient Volterra Kernel Estimation. 2.4 Analysis of Estimation Errors. 3 Parametric Modeling. 3.1 Basic Parametric Model Forms and Estimation Procedures. 3.2 Volterra Kernels of Nonlinear Differential Equations. 3.3 Discrete-Time Volterra Kernels of NARMAX Models. 3.4 From Volterra Kernel Measurements to Parametric Models. 3.5 Equivalence Between Continuous and Discrete Parametric Models. 4 Modular and Connectionist Modeling. 4.1 Modular Form of Nonparametric Models. 4.2 Connectionist Models. 4.3 The Laguerre-Volterra Network. 4.4 The VWM Model. 5 A Practitioner's Guide. 5.1 Practical Considerations and Experimental Requirements. 5.2 Preliminary Tests and Data Preparation. 5.3 Model Specification and Estimation. 5.4 Model Validation and Interpretation. 5.5 Outline of Step-by-Step Procedure. 6 Selected Applications. 6.1 Neurosensory Systems. 6.2 Cardiovascular System. 6.3 Renal System. 6.4 Metabolic-Endocrine System. 7 Modeling of Multiinput/Multioutput Systems. 7.1 The Two-Input Case. 7.2 Applications of Two-Input Modeling to Physiological Systems. 7.3 The Multiinput Case. 7.4 Spatiotemporal and Spectrotemporal Modeling. 8 Modeling of Neuronal Systems. 8.1 A General Model of Membrane and Synaptic Dynamics. 8.2 Functional Integration in the Single Neuron. 8.3 Neuronal Systems with Point-Process Inputs. 8.4 Modeling of Neuronal Ensembles. 9 Modeling of Nonstationary Systems. 9.1 Quasistationary and Recursive Tracking Methods. 9.2 Kernel Expansion Method. 9.3 Network-Based Methods. 9.4 Applications to Nonstationary Physiological Systems. 10 Modeling of Closed-Loop Systems. 10.1 Autoregressive Form of Closed-Loop Model. 10.2 Network Model Form of Closed-Loop Systems. Appendix I: Function Expansions. Appendix II: Gaussian White Noise. Appendix III: Construction of the Wiener Series. Appendix IV: Stationarity, Ergodicity, and Autocorrelation Functions of Random Processes. References. Index.

Journal ArticleDOI
TL;DR: This book discusses the statistical properties of the sample mean, weighted averages, sample variance, and sample median under various distributional assumptions about the underlying independent and identically distributed data (normal, binomial, and a specialized PDF proportional to the square of a sinc function).
Abstract: gain function (i.e., the absolute value of the transfer function) is here termed the “modulation transfer function.” Chapter 9, the Ž rst chapter on statistical theory, discusses the statistical properties of the sample mean, weighted averages, sample variance, and sample median under various distributional assumptions about the underlying independent and identically distributed (iid) data (normal, binomial, and a specialized PDF proportional to the square of a sinc function). The discussions are generally good (particularly for the squared sinc function), although the one on weighted averages might prove to be too abstract for those with little experience in data analysis. Chapter 10 is supposedly a discussion of the estimation of probability laws. Ignoring its confusing introduction, this chapter starts promisingly enough with a section on the orthogonal function approach to estimating PDFs from iid data. A follow-up section discusses a variation involving a Karhunen– Loeve expansion, in which the author fails to point out the practical limitation that knowledge of the unknown PDF is needed to construct the orthogonal functions. After this section, the chapter takes off on a major departure. The data are no longer assumed to be iid realizations of RVs, but rather are taken to be perfectly known functions of the unknown PDF (e.g., our “data” are the mean and variance of the unknown PDF). Under this framework, the author discusses a Bayesian-like approach called the “principle of maximum probability” and relates it to the maximum entropy approach. From a purely mathematical standpoint, this material is fascinating and should be required reading for all advocates of the maximum entropy principle; unfortunately, because of the bizarre notion of what constitutes data, it is only marginally relevant to the statistical problem of PDF estimation. Chapters 11, 12, and 13 provide fairly standard discussions on the chisquared test for signiŽ cance departures from a prespeciŽ ed distribution, the one-sample t test for a mean and the F test for equality of the variances. Chapters 14 and 15 are bare bones introductions to least squares theory and principal components analysis, whereas Chapter 16 is basically a discussion of Bayesian statistical theory in the context of assessing whether or not coin  ips are biased. The book’s Ž nal chapter, 17, “Introduction to Estimation Methods,” starts with a nice overview of maximum likelihood estimators, the Cramer–Rao and Bhattacharyya lower bounds, and Bayesian estimation theory (although purists will object because of the lack of statements about conditions needed for various results to hold). However, as in Chapter 10, the text then takes a major departure away from what most Technometrics readers would consider statistical estimation theory. The author discusses a “Fisher information-based approach” that “aims to Ž nd the true probability law describing a physical phenomenon by deriving a wave equation that deŽ nes the law.” This portion of the chapter (which at times has the  avor of a philosophical discussion) is evidently a synopsis of another book by the same author (Frieden 1998). Missing from this book are many topics that have received considerable attention over the last 20 years in Technometrics (bootstrapping and nonparametric regression being two prominent examples). This bias toward older techniques is probably explained by the fact that this book is the third edition of a text that Ž rst appeared in 1983. The reference lists for Chapters 1–16 include only a smattering of papers and books that have appeared after 1990. (Indeed, the author has not even bothered to update the references for several books that themselves are now out in newer editions.) Chapter 17 does have numerous references from the last decade and seems to be the main motivation for this new edition, but owners of previous editions might look carefully at the Preface to this edition before purchasing. I am reluctant to recommend Probability, Statistical Optics and Data Testing as a textbook for beginning students because of its lack of coverage of important new topics in statistics and because of too many quirky nonstandard deŽ nitions that will make it difŽ cult for students to then jump into the bulk of the statistical and engineering literature. There are, however, some real gems contained in the optical applications and the numerous exercises that the author provides. I would encourage instructors and students of the physical sciences to seek out this book for some challenging applications and statistical problems. Alas, in keeping with the generally dated  avor of the book, the author states that “answers to selected problems may be obtained by writing to the author directly, and enclosing a stamped, self-addressed envelope (about 8 2 £ 11 in) for return”—an odd approach in an age of e-mail permitting bulky attachments.

Journal ArticleDOI
TL;DR: In this article, the authors define the Bernstein copula and study its statistical properties in terms of both distributions and densities, and develop a theory of approximation for multivariate distributions.
Abstract: We define the Bernstein copula and study its statistical properties in terms of both distributions and densities We also develop a theory of approximation for multivariate distributions in terms of Bernstein copulas Rates of consistency when the Bernstein copula density is estimated empirically are given In order of magnitude, this estimator has variance equal to the square root of the variance of common nonparametric estimators, eg, kernel smoothers, but it is biased as a histogram estimatorWe would thank Mark Salmon for interesting us in the copula function and Peter Phillips, an associate editor, and the referees for many valuable comments All remaining errors are our sole responsibility

Journal ArticleDOI
TL;DR: An introduction to Nonparametric Statistics and its applications to Science and Engineering, and a guide to Statistical Limit Theory and nonparametric estimation.
Abstract: Nonparametric Statistical Process ControlAn Introduction to Modern Nonparametric StatisticsRobust Nonparametric Statistical MethodsNonparametric Statistics with Applications to Science and EngineeringNonparametric Statistical InferenceA SAS Companion for Nonparametric StatisticsDeconvolution Problems in Nonparametric StatisticsModern Statistics with RPractical Nonparametric StatisticsNonparametric Statistical MethodsAn Introduction to Statistical LearningA Parametric Approach to Nonparametric StatisticsAll of StatisticsStatistical Methods for Ranking DataOpenIntro StatisticsIntroduction to Nonparametric Statistics for the Biological Sciences Using RStatistical MethodsApplying Contemporary Statistical TechniquesStudyguide for Introduction to Modern Nonparametric Statistics by Higgins, James J., ISBN 9780534387754Nonparametric Goodness-of-Fit Testing Under Gaussian ModelsNonparametric Statistical Methods Using RModern Statistical Methods for HCIApplied Nonparametric Statistics in ReliabilityAn Introduction to Nonparametric StatisticsIntroduction to High-Dimensional StatisticsNonparametric Statistics for Non-StatisticiansAll of Nonparametric StatisticsNonparametric StatisticsConcepts in Probability and Stochastic ModelingModern Applied U-StatisticsNonparametric Functional Data AnalysisTextbook of Parametric and Nonparametric StatisticsIntroduction to Statistical Limit TheoryModern Multivariate Statistical TechniquesAdvances in Contemporary Statistics and EconometricsBasics of Modern Mathematical StatisticsNonparametric StatisticsA Distribution-Free Theory of Nonparametric RegressionNonparametric Statistics for Social and Behavioral SciencesIntroduction to Nonparametric Estimation

Journal ArticleDOI
TL;DR: The authors discuss the applicability of nonparametric item response theory (IRT) models to the construction and psychometric analysis of personality and psychopathology scales, and they contrast these models with parametric IRT models.
Abstract: University of TwenteThe authors discuss the applicability of nonparametric item response theory (IRT) models tothe construction and psychometric analysis of personality and psychopathology scales, andthey contrast these models with parametric IRT models. They describe the fit of nonpara-metric IRT to the Depression content scale of the Minnesota Multiphasic PersonalityInventory—2 (J. N. Butcher, W. G. Dahlstrom, J. R. Graham, A. Tellegen, & B. Kaemmer,1989). They also show how nonparametric IRT models can easily be applied and howmisleading results from parametric IRT models can be avoided. They recommend the use ofnonparametric IRT modeling prior to using parametric logistic models when investigatingpersonality data.

Journal ArticleDOI
TL;DR: A novel independent component analysis algorithm, which is truly blind to the particular underlying distribution of the mixed signals, is introduced, which consistently outperformed all state-of-the-art ICA methods and demonstrated the following properties.
Abstract: In this paper, we introduce a novel independent component analysis (ICA) algorithm, which is truly blind to the particular underlying distribution of the mixed signals. Using a nonparametric kernel density estimation technique, the algorithm performs simultaneously the estimation of the unknown probability density functions of the source signals and the estimation of the unmixing matrix. Following the proposed approach, the blind signal separation framework can be posed as a nonlinear optimization problem, where a closed form expression of the cost function is available, and only the elements of the unmixing matrix appear as unknowns. We conducted a series of Monte Carlo simulations, involving linear mixtures of various source signals with different statistical characteristics and sample sizes. The new algorithm not only consistently outperformed all state-of-the-art ICA methods, but also demonstrated the following properties: 1) Only a flexible model, capable of learning the source statistics, can consistently achieve an accurate separation of all the mixed signals. 2) Adopting a suitably designed optimization framework, it is possible to derive a flexible ICA algorithm that matches the stability and convergence properties of conventional algorithms. 3) A nonparametric approach does not necessarily require large sample sizes in order to outperform methods with fixed or partially adaptive contrast functions.

Proceedings ArticleDOI
04 Jul 2004
TL;DR: A mean-field variational approach to approximate inference for the Dirichlet process, where the approximate posterior is based on the truncated stick-breaking construction (Ishwaran & James, 2001).
Abstract: Variational inference methods, including mean field methods and loopy belief propagation, have been widely used for approximate probabilistic inference in graphical models. While often less accurate than MCMC, variational methods provide a fast deterministic approximation to marginal and conditional probabilities. Such approximations can be particularly useful in high dimensional problems where sampling methods are too slow to be effective. A limitation of current methods, however, is that they are restricted to parametric probabilistic models. MCMC does not have such a limitation; indeed, MCMC samplers have been developed for the Dirichlet process (DP), a nonparametric distribution on distributions (Ferguson, 1973) that is the cornerstone of Bayesian nonparametric statistics (Escobar & West, 1995; Neal, 2000). In this paper, we develop a mean-field variational approach to approximate inference for the Dirichlet process, where the approximate posterior is based on the truncated stick-breaking construction (Ishwaran & James, 2001). We compare our approach to DP samplers for Gaussian DP mixture models.

Journal ArticleDOI
TL;DR: This review introduces nonparametric methods for testing differences between more than two groups or treatments and three of the more common tests are described in detail.
Abstract: This review introduces nonparametric methods for testing differences between more than two groups or treatments Three of the more common tests are described in detail, together with multiple comparison procedures for identifying specific differences between pairs of groups

Journal ArticleDOI
TL;DR: A Markov chain Monte Carlo scheme is developed to allow efficient implementation of full posterior inference in the given model and includes a regression at the level of the nonparametric model.
Abstract: Summary. We consider the problem of combining inference in related nonparametric Bayes models. Analogous to parametric hierarchical models, the hierarchical extension formalizes borrowing strength across the related submodels. In the nonparametric context, modelling is complicated by the fact that the random quantities over which we define the hierarchy are infinite dimensional. We discuss a formal definition of such a hierarchical model. The approach includes a regression at the level of the nonparametric model. For the special case of Dirichlet process mixtures, we develop a Markov chain Monte Carlo scheme to allow efficient implementation of full posterior inference in the given model.

Journal ArticleDOI
TL;DR: In this paper, the authors consider the problem of estimating the distribution of payoffs in a discrete dynamic game, focusing on models where the goal is to learn about the distributions of firms' entry and exit costs.
Abstract: This paper considers the problem of estimating the distribution of payoffs in a discrete dynamic game, focusing on models where the goal is to learn about the distribution of firms' entry and exit costs. The idea is to begin with non parametric first stage estimates of entry and continuation values obtained by computing sample averages of the realized continuation values of entrants who do enter and incumbents who do continue. Under certain assumptions these values are linear functions of the parameters of the problem, and hence are not computationally burdensome to use. Attention is given to the small sample problem of estimation error in the non parametric estimates and this leads to a preference for use of particularly simple estimates of continuation values and moments.

Journal ArticleDOI
TL;DR: This paper embeds A(n)-based inference into the theory of interval probability, by showing that the corresponding bounds are totally monotone F-probability and coherent.

Journal ArticleDOI
TL;DR: The general approach to performing distribution fitting with maximum likelihood (ML) and a method based on quantiles (quantile maximum probability, QMP) are reviewed and it is shown that QMP has both small bias and good efficiency when used with common distribution functions.
Abstract: The most powerful tests of response time (RT) models often involve the whole shape of the RT distribution, thus avoiding mimicking that can occur at the level of RT means and variances. Nonparametric distribution estimation is, in principle, the most appropriate approach, but such estimators are sometimes difficult to obtain. On the other hand, distribution fitting, given an algebraic function, is both easy and compact. We review the general approach to performing distribution fitting with maximum likelihood (ML) and a method based on quantiles (quantile maximum probability, QMP). We show that QMP has both small bias and good efficiency when used with common distribution functions (the ex-Gaussian, Gumbel, lognormal, Wald, and Weibull distributions). In addition, we review some software packages performing ML (PASTIS, QMPE, DISFIT, and MATHEMATICA) and compare their results. In general, the differences between packages have little influence on the optimal solution found, but the form of the distribution function has: Both the lognormal and the Wald distributions have non-linear dependencies between the parameter estimates that tend to increase the overall bias in parameter recovery and to decrease efficiency. We conclude by laying out a few pointers on how to relate descriptive models of RT to cognitive models of RT. A program that generated the random deviates used in our studies may be downloaded from www.psychonomic.org/archive/.

Journal ArticleDOI
TL;DR: In this article, the authors consider a class of semiparametric regression models which are one-parameter extensions of the Cox [J. Roy. Statist. Ser. B 34 (1972) 187-220] model for right-censored univariate failure times.
Abstract: We consider a class of semiparametric regression models which are one-parameter extensions of the Cox [J. Roy. Statist. Soc. Ser. B 34 (1972) 187–220] model for right-censored univariate failure times. These models assume that the hazard given the covariates and a random frailty unique to each individual has the proportional hazards form multiplied by the frailty. The frailty is assumed to have mean 1 within a known one-parameter family of distributions. Inference is based on a nonparametric likelihood. The behavior of the likelihood maximizer is studied under general conditions where the fitted model may be misspecified. The joint estimator of the regression and frailty parameters as well as the baseline hazard is shown to be uniformly consistent for the pseudo-value maximizing the asymptotic limit of the likelihood. Appropriately standardized, the estimator converges weakly to a Gaussian process. When the model is correctly specified, the procedure is semiparametric efficient, achieving the semiparametric information bound for all parameter components. It is also proved that the bootstrap gives valid inferences for all parameters, even under misspecification. We demonstrate analytically the importance of the robust inference in several examples. In a randomized clinical trial, a valid test of the treatment effect is possible when other prognostic factors and the frailty distribution are both misspecified. Under certain conditions on the covariates, the ratios of the regression parameters are still identifiable. The practical utility of the procedure is illustrated on a non-Hodgkin’s lymphoma dataset.

Report SeriesDOI
TL;DR: In this paper, the additive components of a nonparametric additive quantile regression model are estimated with a rate of convergence in probability n−r/(2r+1) when the additive component are r-times continuously differentiable for some r ≥ 2.
Abstract: This article is concerned with estimating the additive components of a nonparametric additive quantile regression model. We develop an estimator that is asymptotically normally distributed with a rate of convergence in probability of n−r/(2r+1) when the additive components are r-times continuously differentiable for some r ≥ 2. This result holds regardless of the dimension of the covariates, and thus the new estimator has no curse of dimensionality. In addition, the estimator has an oracle property and is easily extended to a generalized additive quantile regression model with a link function. The numerical performance and usefulness of the estimator are illustrated by Monte Carlo experiments and an empirical example.

01 Jan 2004
TL;DR: The application of histograms and nonparametric kernel methods to explore data are described and the details of theory, computation, visualization, and presentation are described.
Abstract: Modern data analysis requires a number of tools to undercover hidden structure. For initial exploration of data, animated scatter diagrams and nonparametric density estimation in many forms and varieties are the techniques of choice. This article focuses on the application of histograms and nonparametric kernel methods to explore data. The details of theory, computation, visualization, and presentation are all described.

Journal ArticleDOI
TL;DR: A semiparametric model for functional data where the warping functions are assumed to be linear combinations of q common components, which are estimated from the data (hence the name ‘self‐modelling’).
Abstract: The paper introduces a semiparametric model for functional data. The warping functions are assumed to be linear combinations ofqcommon components, which are estimated from the data (hence the name‘self-modelling’). Even small values ofqprovide remarkable model flexibility, comparable with nonparametric methods. At the same time, this approach avoids overfitting because the common components are estimated combining data across individuals. As a convenient by-product, component scores are often interpretable and can be used for statistical inference (an example of classification based on scores is given). [ABSTRACT FROM AUTHOR]

Journal ArticleDOI
TL;DR: In this paper, affine-invariant spatial sign and spatial rank vectors are used for multivariate nonparametric statistical tests of hypotheses for the one-sample location problem, the several sample location problem and the problem of testing independence between pairs of vectors.
Abstract: Multivariate nonparametric statistical tests of hypotheses are described for the one-sample location problem, the several-sample location problem and the problem of testing independence between pairs of vectors. These methods are based on affine-invariant spatial sign and spatial rank vectors. They provide affine-invariant multivariate generalizations of the univariate sign test, signed-rank test, Wilcoxon rank sum test, Kruskal–Wallis test, and the Kendall and Spearman correlation tests. While the emphasis is on tests of hypotheses, certain references to associated affine-equivariant estimators are included. Pitman asymptotic efficiencies demonstrate the excellent performance of these methods, particularly in heavy-tailed population settings. Moreover, these methods are easy to compute for data in common dimensions.

Journal ArticleDOI
TL;DR: In this article, a CRRA risk aversion transformation and a statistical calibration are used to adjust from risk-neutral (RN) to real-world (RW) densities derived from option prices and risk assumptions, and compared with historical densities obtained from time series.
Abstract: Risk-neutral (RN) and real-world (RW) densities are derived from option prices and risk assumptions, and are compared with historical densities obtained from time series. Two parametric methods that adjust from RN to RW densities are developed, firstly a CRRA risk aversion transformation and secondly a statistical calibration. Both risk transformations are estimated using likelihood techniques, for two flexible but tractable parametric density families. Results for the FTSE-100 index show that parametric densities derived from option prices have more explanatory power than historical densities. The parametric densities also have higher likelihoods than nonparametric densities estimated by spline methods. Furthermore, the pricing kernel between RN & historical densities is only incompatible with a risk averse representative agent when spline methods provide the RN densities.

Journal ArticleDOI
TL;DR: Adjustments are introduced for two of the characteristic values produced by a progressive scrambling analysis -- the deprecated predictivity and standard error of prediction (SDEPs*) -- that correct for the effect of introduced perturbation.
Abstract: The two methods most often used to evaluate the robustness and predictivity of partial least squares (PLS) models are cross-validation and response randomization. Both methods may be overly optimistic for data sets that contain redundant observations, however. The kinds of perturbation analysis widely used for evaluating model stability in the context of ordinary least squares regression are only applicable when the descriptors are independent of each other and errors are independent and normally distributed; neither assumption holds for QSAR in general and for PLS in particular. Progressive scrambling is a novel, nonparametric approach to perturbing models in the response space in a way that does not disturb the underlying covariance structure of the data. Here, we introduce adjustments for two of the characteristic values produced by a progressive scrambling analysis - the deprecated predictivity (Q*2s) and standard error of prediction (SDEPs*) - that correct for the effect of introduced perturbation. We also explore the statistical behavior of the adjusted values (Q*2(0) and SDEP0*) and the sensitivity to perturbation (dq2/dryy'2). It is shown that the three statistics are all robust for stable PLS models, in terms of the stochastic component of their determination and of their variation due to sampling effects involved in training set selection.

Posted Content
TL;DR: In this article, employment density functions are estimated for 62 large metropolitan areas and the results serve as a warning that functional form misspecification causes spatial autocorrelation, and the LM test statistics fall dramatically when the models are estimated using flexible parametric and nonparametric methods.
Abstract: Employment density functions are estimated for 62 large metropolitan areas. Estimated gradients are statistically significant for distance from the nearest subcenter as well as for distance from the traditional central business district. Lagrange Multiplier (LM) tests imply significant spatial autocorrelation under highly restrictive ordinary least squares (OLS) specifications. The LM test statistics fall dramatically when the models are estimated using flexible parametric and nonparametric methods. The results serve as a warning that functional form misspecification causes spatial autocorrelation.