scispace - formally typeset
Search or ask a question

Showing papers on "Nonparametric statistics published in 2001"


Journal ArticleDOI
TL;DR: In this article, a non-parametric method for multivariate analysis of variance, based on sums of squared distances, is proposed. But it is not suitable for most ecological multivariate data sets.
Abstract: Hypothesis-testing methods for multivariate data are needed to make rigorous probability statements about the effects of factors and their interactions in experiments. Analysis of variance is particularly powerful for the analysis of univariate data. The traditional multivariate analogues, however, are too stringent in their assumptions for most ecological multivariate data sets. Non-parametric methods, based on permutation tests, are preferable. This paper describes a new non-parametric method for multivariate analysis of variance, after McArdle and Anderson (in press). It is given here, with several applications in ecology, to provide an alternative and perhaps more intuitive formulation for ANOVA (based on sums of squared distances) to complement the description pro- vided by McArdle and Anderson (in press) for the analysis of any linear model. It is an improvement on previous non-parametric methods because it allows a direct additive partitioning of variation for complex models. It does this while maintaining the flexibility and lack of formal assumptions of other non-parametric methods. The test- statistic is a multivariate analogue to Fisher's F-ratio and is calculated directly from any symmetric distance or dissimilarity matrix. P-values are then obtained using permutations. Some examples of the method are given for tests involving several factors, including factorial and hierarchical (nested) designs and tests of interactions.

12,328 citations


Journal ArticleDOI
01 Jan 2001-Ecology
TL;DR: The distance-based redundancy analysis (db-RDA) as mentioned in this paper is a nonparametric multivariate analysis of ecological data using permutation tests that is used to partition the variability in the data according to a complex design or model, as is often required in ecological experiments.
Abstract: Nonparametric multivariate analysis of ecological data using permutation tests has two main challenges: (1) to partition the variability in the data according to a complex design or model, as is often required in ecological experiments, and (2) to base the analysis on a multivariate distance measure (such as the semimetric Bray-Curtis measure) that is reasonable for ecological data sets. Previous nonparametric methods have succeeded in one or other of these areas, but not in both. A recent contribution to Ecological Monographs by Legendre and Anderson, called distance-based redundancy analysis (db-RDA), does achieve both. It does this by calculating principal coordinates and subsequently correcting for negative eigenvalues, if they are present, by adding a constant to squared distances. We show here that such a correction is not necessary. Partitioning can be achieved directly from the distance matrix itself, with no corrections and no eigenanalysis, even if the distance measure used is semimetric. An ecological example is given to show the differences in these statistical methods. Empirical simulations, based on parameters estimated from real ecological species abundance data, showed that db-RDA done on multifactorial designs (using the correction) does not have type 1 error consistent with the significance level chosen for the analysis (i.e., does not provide an exact test), whereas the direct method described and advocated here does.

3,468 citations


Journal ArticleDOI
TL;DR: Data Analysis by Resampling is a useful and clear introduction to resampling that would make an ambitious second course in statistics or a good third or later course and is quite well suited for self-study by an individual with just a few previous statistics courses.
Abstract: described and related to one another and to the different resampling methods is also notable. This is especially useful for the book’s target audience, for whom such concepts may not yet have taken root. On the computational side, the book may be a little less satisfying. Stepby-step computational algorithms are at some times inefŽ cient and at other times cryptic so that an individual with little programming experience might have difŽ culty applying them. This problem is substantially offset by the presence of numerous detailed examples solved using existing software, providing readers roughly equal exposure to S-PLUS, SC, and Resampling Stats. Unfortunately, these examples often require large, complex programs, demonstrating as much as anything a need for better resampling software. On the whole, Data Analysis by Resampling is a useful and clear introduction to resampling. It would make an ambitious second course in statistics or a good third or later course. It is quite well suited for self-study by an individual with just a few previous statistics courses. Although it would be miscast as a graduate-level textbook or as a research reference—for one thing it lacks a thorough bibliography to make up for its surface treatment of many of the topics it covers—it is a very nice book for any reader seeking an introductory book on resampling.

1,840 citations


Book
03 May 2001
TL;DR: In this article, the CircStats package is used to use Bessel functions to estimate the probability of a given point in a given set of data points and to detect outliers and change-point problems.
Abstract: Circular Probability Distributions Some Sampling Distributions Estimation of Parameters Tests for Mean Direction and Concentration Tests for Uniformity Nonparametric Testing Procedures Circular Correlation and Regression Predictive Inference for Directional Data Outliers and Related Problems Change-Point Problems Miscellaneous Topics Some Facts on Bessel Functions How to Use the CircStats Package.

1,301 citations


Journal ArticleDOI
TL;DR: In this article, nonparametric statistical methods are used to evaluate the performance of a set of NQM methods. But they are not suitable for non-parametric methods.
Abstract: (2001). Nonparametric Statistical Methods, 2nd, Ed. Journal of Quality Technology: Vol. 33, No. 2, pp. 259-259.

1,051 citations


Journal ArticleDOI
TL;DR: Some usability and interpretability issues for single-strategy cognitive assessment models are considered and an example shows that these models can be sensitive to cognitive attributes, even in data designed to well fit the Rasch model.
Abstract: Some usability and interpretability issues for single-strategy cognitive assessment models are considered. These models posit a stochastic conjunctive relationship between a set of cognitive attributes to be assessed and performance on particular items/tasks in the assessment. The models considered make few assumptions about the relationship between latent attributes and task performance beyond a simple conjunctive structure. An example shows that these models can be sensitive to cognitive attributes, even in data designed to well fit the Rasch model. Several stochastic ordering and monotonicity properties are considered that enhance the interpretability of the models. Simple data summaries are identified that inform about the presence or absence of cognitive attributes when the full computational power needed to estimate the models is not available.

836 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present a practical nonparametric statistical model for non-parametric statistics, which they call Practical Nonparametric Statistics, 3rd Ed. (3rd Ed.).
Abstract: (2001). Practical Nonparametric Statistics, 3rd Ed. Journal of Quality Technology: Vol. 33, No. 2, pp. 260-260.

791 citations


BookDOI
01 Sep 2001
TL;DR: This paper presents nonparametric Descriptive Methods to Check Parametric Assumptions in Exponential Models of Time-Dependence and Semi-Parametric Transition Rate Models, which are based on Exponential Transition rate models from the TDA.
Abstract: Contents: Preface. Introduction. Event History Data Structures. Nonparametric Descriptive Methods. Exponential Transition Rate Models. Piecewise Constant Exponential Models. Exponential Models With Time-Dependent Covariates. Parametric Models of Time-Dependence. Methods to Check Parametric Assumptions. Semi-Parametric Transition Rate Models. Problems of Model Specification. Appendix: Basic Information About TDA.

785 citations


Journal ArticleDOI
TL;DR: The generalized likelihood ratio statistics are shown to be general and powerful for nonparametric testing problems based on function estimation and can even be adaptively optimal in the sense of Spokoiny by using a simple choice of adaptive smoothing parameter.
Abstract: Likelihood ratio theory has had tremendous success in parametric inference, due to the fundamental theory of Wilks. Yet, there is no general applicable approach for nonparametric inferences based on function estimation. Maximum likelihood ratio test statistics in general may not exist in nonparametric function estimation setting. Even if they exist, they are hard to find and can not; be optimal as shown in this paper. We introduce the generalized likelihood statistics to overcome the drawbacks of nonparametric maximum likelihood ratio statistics. A new S Wilks phenomenon is unveiled. We demonstrate that a class of the generalized likelihood statistics based on some appropriate nonparametric estimators are asymptotically distribution free and follow chi (2)-distributions under null hypotheses for a number of useful hypotheses and a variety of useful models including Gaussian white noise models, nonparametric regression models, varying coefficient models and generalized varying coefficient models. We further demonstrate that generalized likelihood ratio statistics are asymptotically optimal in the sense that they achieve optimal rates of convergence given by Ingster. They can even be adaptively optimal in the sense of Spokoiny by using a simple choice of adaptive smoothing parameter. Our work indicates that the generalized likelihood ratio statistics are indeed general and powerful for nonparametric testing problems based on function estimation.

676 citations


Report SeriesDOI
01 Nov 2001
TL;DR: In this article, the authors consider the nonparametric and semiparametric methods for estimating regression models with continuous endogenous regressors and identify the "average structural function" as a parameter of central interest.
Abstract: This paper considers the nonparametric and semiparametric methods for estimating regression models with continuous endogenous regressors. We list a number of different generalizations of the linear structural equation model, and discuss how two common estimation approaches for linear equations — the "instrumental variables" and "control function" approaches — may be extended to nonparametric generalizations of the linear model and to their semiparametric variants. We consider the identification and estimation of the "Average Structural Function" and argue that this is a parameter of central interest in the analysis of semiparametric and non- parametric models with endogenous regressors. We consider a particular semiparametric model, the binary response model with linear index function and nonparametric error distribution, and describes in detail how estimation of the parameters of interest can be constructed using the "control function" approach. This estimator is applied to estimating the relation of labor force participation to nonlabor income, viewed as an endogenous regressor.

578 citations


Journal ArticleDOI
TL;DR: The purpose in this paper is to provide a general approach to model selection via penalization for Gaussian regression and to develop the point of view about this subject.
Abstract: Our purpose in this paper is to provide a general approach to model selection via penalization for Gaussian regression and to develop our point of view about this subject. The advantage and importance of model selection come from the fact that it provides a suitable approach to many different types of problems, starting from model selection per se (among a family of parametric models, which one is more suitable for the data at hand), which includes for instance variable selection in regression models, to nonparametric estimation, for which it provides a very powerful tool that allows adaptation under quite general circumstances. Our approach to model selection also provides a natural connection between the parametric and nonparametric points of view and copes naturally with the fact that a model is not necessarily true. The method is based on the penalization of a least squares criterion which can be viewed as a generalization of Mallows’Cp. A large part of our efforts will be put on choosing properly the list of models and the penalty function for various estimation problems like classical variable selection or adaptive estimation for various types of lp-bodies.

Book
11 May 2001
TL;DR: In this paper, Martingale et al. proposed a model for estimating the probability of failure of a small subset of survival data in a large set of small data sets, based on Bernoulli trials.
Abstract: CONTINUOUS FAILURE TIMES AND THEIR CAUSES Basic Probability Functions Some Small Data Sets Hazard Functions Regression Models PARAMETRIC LIKELIHOOD INFERENCE The Likelihood for Competing Risks Model Checking Inference Some Examples Masked Systems LATENT FAILURE TIMES: PROBABILITY DISTRIBUTIONS Basic Probability Functions Some Examples Marginal vs. Sub-Distributions Independent Risks A Risk-Removal Model LIKELIHOOD FUNCTIONS FOR UNIVARIATE SURVIVAL DATA Discrete and Continuous Failure Times Discrete Failure Times: Estimation Continuous Failure Times: Random Samples Continuous Failure Times: Explanatory Variables Discrete Failure Times Again Time-Dependent Covariates DISCRETE FAILURE TIMES IN COMPETING RISKS Basic Probability Functions Latent Failure Times Some Examples Based on Bernoulli Trials Likelihood Functions HAZARD-BASED METHODS FOR CONTINUOUS FAILURE TIMES Latent Failure Times vs. Hazard Modelling Some Examples of Hazard Modelling Nonparametric Methods for Random Samples Proportional Hazards and Partial Likelihood LATENT FAILURE TIMES: IDENTIFIABILITY CRISES The Cox-Tsiatis Impasse More General Identifiability Results Specified Marginals Discrete Failure Times Regression Case Censoring of Survival Data Parametric Identifiability MARTINGALE COUNTING PROCESSESES IN SURVIVAL DATA Introduction Back to Basics: Probability Spaces and Conditional Expectation Filtrations Martingales Counting Processes Product Integrals Survival Data Non-parametric Estimation Non-parametric Testing Regression Models Epilogue APPENDIX 1: Numerical Maximisation of Likelihood Functions APPENDIX 2: Bayesian Computation Bibliography Index

Journal ArticleDOI
01 Jun 2001
TL;DR: In this paper, it is shown that the mean magnitude relative error (MMRE) and the number of predictions within 25% of the actual, pred(25) are measures of the spread and the kurtosis of the variable z, where z =estimate/actual.
Abstract: Provides the software estimation research community with a better understanding of the meaning of, and relationship between, two statistics that are often used to assess the accuracy of predictive models: the mean magnitude relative error (MMRE) and the number of predictions within 25% of the actual, pred(25). It is demonstrated that MMRE and pred(25) are, respectively, measures of the spread and the kurtosis of the variable z, where z=estimate/actual. Thus, z is considered to be a measure of accuracy, and statistics such as MMRE and pred(25) to be measures of properties of the distribution of z. It is suggested that measures of the central location and skewness of z, as well as measures of spread and kurtosis, are necessary. Furthermore, since the distribution of z is non-normal, non-parametric measures of these properties may be needed. For this reason, box-plots of z are useful alternatives to simple summary metrics. It is also noted that the simple residuals are better behaved than the z variable, and could also be used as the basis for comparing prediction systems.

Journal ArticleDOI
TL;DR: In this paper, the authors developed a new test of a parametric model of a conditional mean function against a nonparametric alternative, which adapts to the unknown smoothness of the alternative model and is uniformly consistent against alternatives whose distance from the parametric models converges to zero at the fastest possible rate.
Abstract: We develop a new test of a parametric model of a conditional mean function against a nonparametric alternative. The test adapts to the unknown smoothness of the alternative model and is uniformly consistent against alternatives whose distance from the parametric model converges to zero at the fastest possible rate. This rate is slower than n -1/2 . Some existing tests have nontrivial power against restricted classes of alternatives whose distance from the parametric model decreases at the rate n -1/2 . There are, however, sequences of alternatives against which these tests are inconsistent and ours is consistent. As a consequence, there are alternative models for which the finite-sample power of our test greatly exceeds that of existing tests. This conclusion is illustrated by the results of some Monte Carlo experiments.

Journal ArticleDOI
TL;DR: This paper discusses in detail wavelet methods in nonparametric regression, where the data are modelled as observations of a signal contaminated with additive Gaussian noise, and provides an extensive review of the vast literature of wavelet shrinkage and wavelet thresholding estimators developed to denoise such data.
Abstract: Wavelet analysis has been found to be a powerful tool for the nonparametric estimation of spatially-variable objects. We discuss in detail wavelet methods in nonparametric regression, where the data are modelled as observations of a signal contaminated with additive Gaussian noise, and provide an extensive review of the vast literature of wavelet shrinkage and wavelet thresholding estimators developed to denoise such data. These estimators arise from a wide range of classical and empirical Bayes methods treating either individual or blocks of wavelet coefficients. We compare various estimators in an extensive simulation study on a variety of sample sizes, test functions, signal-to-noise ratios and wavelet filters. Because there is no single criterion that can adequately summarise the behaviour of an estimator, we use various criteria to measure performance in finite sample situations. Insight into the performance of these estimators is obtained from graphical outputs and numerical tables. In order to provide some hints of how these estimators should be used to analyse real data sets, a detailed practical step-by-step illustration of a wavelet denoising analysis on electrical consumption is provided. Matlab codes are provided so that all figures and tables in this paper can be reproduced.

Proceedings ArticleDOI
01 Dec 2001
TL;DR: A new method for constructing genetic network from gene expression data by using Bayesian networks is proposed, which uses nonparametric regression for capturing nonlinear relationships between genes and derives a new criterion for choosing the network in general situations.
Abstract: We propose a new method for constructing genetic network from gene expression data by using Bayesian networks. We use nonparametric regression for capturing nonlinear relationships between genes and derive a new criterion for choosing the network in general situations. In a theoretical sense, our proposed theory and methodology include previous methods based on Bayes approach. We applied the proposed method to the S. cerevisiae cell cycle data and showed the effectiveness of our method by comparing with previous methods.

Journal ArticleDOI
TL;DR: Under mild conditions, it is shown that the squared L2 risk of the estimator based on ARM is basically bounded above by the risk of each candidate procedure plus a small penalty term of order 1/n, giving the automatically optimal rate of convergence for ARM.
Abstract: Adaptation over different procedures is of practical importance. Different procedures perform well under different conditions. In many practical situations, it is rather hard to assess which conditions are (approximately) satisfied so as to identify the best procedure for the data at hand. Thus automatic adaptation over various scenarios is desirable. A practically feasible method, named adaptive regression by mixing (ARM), is proposed to convexly combine general candidate regression procedures. Under mild conditions, the resulting estimator is theoretically shown to perform optimally in rates of convergence without knowing which of the original procedures work the best. Simulations are conducted in several settings, including comparing a parametric model with nonparametric alternatives, comparing a neural network with a projection pursuit in multidimensional regression, and combining bandwidths in kernel regression. The results clearly support the theoretical property of ARM. The ARM algorithm assigns we...

Journal ArticleDOI
01 Feb 2001
TL;DR: A variable-string-length genetic algorithm is used for developing a novel nonparametric clustering technique when the number of clusters is not fixed a-priori.
Abstract: A variable-string-length genetic algorithm (GA) is used for developing a novel nonparametric clustering technique when the number of clusters is not fixed a-priori. Chromosomes in the same population may now have different lengths since they encode different number of clusters. The crossover operator is redefined to tackle the concept of variable string length. A cluster validity index is used as a measure of the fitness of a chromosome. The performance of several cluster validity indices, namely the Davies-Bouldin (1979) index, Dunn's (1973) index, two of its generalized versions and a recently developed index, in appropriately partitioning a data set, are compared.

Journal ArticleDOI
TL;DR: In this paper, different specifications of conditional expectations are compared with nonparametric techniques that make no assumptions about the distribution of the data, and the conditional mean and variance of the NYSE market return are examined.

Journal ArticleDOI
01 Jan 2001
TL;DR: In this article, the authors summarize some recent developments in the analysis of nonparametric models where the classical models of ANOVA are generalized in such a way that not only the assumption of normality is relaxed but also the structure of the designs is introduced in a broader framework and also the concept of treatment effects is redefined.
Abstract: In this paper, we summarize some recent developments in the analysis of nonparametric models where the classical models of ANOVA are generalized in such a way that not only the assumption of normality is relaxed but also the structure of the designs is introduced in a broader framework and also the concept of treatment effects is redefined. The continuity of the distribution functions is not assumed so that not only data from continuous distributions but also data with ties are included in this general setup. In designs with independent observations as well as in repeated measures designs, the hypotheses are formulated by means of the distribution functions. The main results are given in a unified form. Some applications to special designs are considered, where in simple designs, some well known statistics (such as the Kruskal-Wallis statistic and the χ2-statistic for dichotomous data) come out as special cases. The general framework presented here enables the nonparametric analysis of data with continuous distribution functions as well as arbitrary discrete data such as count data, ordered categorical and dichotomous data.

Journal ArticleDOI
TL;DR: The authors make the case that nonparametric techniques need not be limited to use by econometricians, and they also discuss the use of non-parametric regression for density estimation, which concerns estimation of regression functions without the straightjacket of a specific functional form.
Abstract: Even a cursory look at the empirical literature in most fields of economics reveals that a majority of applications use simple parametric approaches such as ordinary least squares regression or two-stage least squares accompanied by simple descriptive statistics. The use of such methods has persisted despite the development of more general nonparametric techniques in the recent (and perhaps not-so-recent) statistics and econometrics literatures. At least two possible explanations for this come to mind. First, given the challenges— or lack of—provided by economic theories with empirical content, the parametric toolkit is more than sufficient. Where serious first-order problems in nonexperimental inference exist, they are in the inadequacy of the research design and data, not in the limitations of the parametric approach. Second, the predominant use of parametric approaches may reflect the lack of sufficient computational power or the difficulty of computation with off-the-shelf statistical software. Given the recent advances in computing power and software (as well as the development of the necessary theoretical foundation), only the first point remains an open question. The purpose of this article is to make the case that nonparametric techniques need not be limited to use by econometricians. Our discussion is divided into two parts. In the first part, we focus on “density estimation”— estimation of the entire distribution of a variable or set of variables. In the second part, we discuss nonparametric regression, which concerns estimation of regression functions without the straightjacket of a specific functional form.

Journal ArticleDOI
TL;DR: The book provides a brief introduction to SAS, SPSS, and BMDP, along with their use in performing ANOVA, and is indeed an excellent source of reference for the ANOVA based on Ž xed, random, and mixed-effects models.
Abstract: the book provides a brief introduction to SAS, SPSS, and BMDP, along with their use in performing ANOVA. The book also has a chapter devoted to experimental designs and the corresponding ANOVA. In terms of coverage, a nice feature of the book is the inclusion of a chapter on Ž nite population models—typically not found in books on experimental designs and ANOVA. Several appendixes are given at the end of the book discussing some of the standard distributions, the Satterthwaite approximation, rules for computing the sums of squares, degrees of freedom, expected mean squares, and so forth. The exercises at the end of each chapter contain a number of numerical problems. Some of my quibbles about the book are the following. At times, it simply gives expressions without adequate motivation or examples. A reader who is not already familiar with ANOVA techniques will wonder as to the relevance of some of the expressions. Just to give an example, the quantity “sum of squares due to a contrast” is deŽ ned on page 65. The algebraic property that the sums of squares due to a set of a ƒ 1 orthogonal contrasts will add up to the sum of squares due to an effect having a ƒ 1 df is then stated. Given the level of the book, discussion of such a property appears to be irrelevant. I did not see this property used anywhere in the book; neither did I see the sum of squares due to a contrast explicitly used or mentioned later in the book. Examples in which the one-way model is adequate are mentioned only after introducing the model and the assumptions, and the examples are buried inside the remarks (in small print) following the model. This is also the case with the two-way model with interaction (Chap. 4). The authors indicate in the preface that the remarks are mostly meant to include results to be kept out of the main body of the text. I believe that good examples should be the starting point for introducing ANOVA models. The authors present the analysis of Ž xed, random, and mixed models simultaneously. Motivating examples that distinguish between these scenarios should have been made the highlight of the presentation in each chapter rather than deferred to the later part of the chapter under “worked out examples” or buried within the remarks. The authors discuss transformations to correct lack of normality and lack of homoscedasticity (Sec. 2.22). However, these are not illustrated with any real examples. Regarding tests concerning the departure from the model assumptions, formal tests are presented in some detail; however, graphical procedures are only very brie y mentioned under a remark. I consider this to be a glaring omission. Consequently, I would be somewhat hesitant to recommend this book to anyone interested in actual data analysis using ANOVA unless the application is such that one of the standard models (along with the standard assumptions) is known to be adequate and diagnostic checks are not called for. Obviously, this is an unlikely scenario in most applications. The preceding criticisms aside, I can see myself consulting this book to refer to an ANOVA table, to look up an expected value or test statistic under a random or mixed-effects model, or to refer to the use of SAS, SPSS, or BMDP for performing ANOVA. The book is indeed an excellent source of reference for the ANOVA based on Ž xed, random, and mixed-effects models.

Journal ArticleDOI
David Scott1
TL;DR: This article investigates the use of integrated square error, or L2 distance, as a theoretical and practical estimation tool for a variety of parametric statistical models and demonstrates by example the well-known result that minimum distance estimators, including L2E, are inherently robust.
Abstract: The likelihood function plays a central role in parametric and Bayesian estimation, as well as in nonparametric function estimation via local polynomial modeling. However, integrated square error has enjoyed a long tradition as the goodness-of-fit criterion of choice in nonparametric density estimation. In this article, I investigate the use of integrated square error, or L2 distance, as a theoretical and practical estimation tool for a variety of parametric statistical models. I show that the asymptotic inefficiency of the parameters estimated by minimizing the integrated square error or L2estimation (L2E) criterion versus the maximum likelihood estimator is roughly that of the median versus the mean. I demonstrate by example the well-known result that minimum distance estimators, including L2E, are inherently robust; however, L2E does not require specification of any tuning factors found in robust likelihood algorithms. L2E is particularly appropriate for analyzing massive datasets in which data cleanin...

Journal ArticleDOI
TL;DR: The authors developed two fully Bayesian modeling approaches, employing mixture models, for the errors in a median regression model and associated families of error distributions allow for increased variability, skewness, and flexible tail behavior.
Abstract: Median regression models become an attractive alternative to mean regression models when employing flexible families of distributions for the errors. Classical approaches are typically algorithmic with desirable properties emerging asymptotically. However, nonparametric error models may be most attractive in the case of smaller sample sizes where parametric specifications are difficult to justify. Hence, a Bayesian approach, enabling exact inference given the observed data, may be appealing. In this context there is little Bayesian work. We develop two fully Bayesian modeling approaches, employing mixture models, for the errors in a median regression model. The associated families of error distributions allow for increased variability, skewness, and flexible tail behavior. The first family is semiparametric with extra variability captured nonparametrically through mixing and skewness handled parametrically. The second family, a fully nonparametric one, includes all unimodal densities on the real line with...

Posted Content
TL;DR: In this article, the authors considered the problem of estimating a partially linear semiparametric fixed eects panel data model with possible endogeneity and established the root N normality result for the estimator of the parametric component.
Abstract: This paper considers the problem of estimating a partially linear semiparametric fixed eects panel data model with possible endogeneity. Using the series method, we establish the root N normality result for the estimator of the parametric component, and we show that the unknown function can be consistently estimated at the standard nonparametric rate.

Journal ArticleDOI
TL;DR: In this paper, a nonparametric estimation theory in a nonstationary environment, more precisely in the framework of null recurrent Markov chains, is developed, which makes it possible to decompose the times series under consideration into independent and identical parts.
Abstract: We develop a nonparametric estimation theory in a nonstationary environment, more precisely in the framework of null recurrent Markov chains. An essential tool is the split chain, which makes it possible to decompose the times series under consideration into independent and identical parts. A tail condition on the distribution of the recurrence time is introduced. This condition makes it possible to prove weak convergence results for sums of functions of the process depending on a smoothing parameter. These limit results are subsequently used to obtain consistency and asymptotic normality for local density estimators and for estimators of the conditional mean and the conditional variance. In contradistinction to the parametric case, the convergence rate is slower than in the stationary case, and it is directly linked to the tail behavior of the recurrence time. Applications to econometric, and in particular to cointegration models, are indicated.

Journal ArticleDOI
TL;DR: In this paper, the adaptive Neyman test is used to check the bias vector of residuals from parametric fits against large nonparametric alternatives, and the power of the proposed tests is comparable to the F-test statistic even in situations where the F test is known to be suitable and can be far more powerful than the F -test statistic in other situations.
Abstract: Several new tests are proposed for examining the adequacy of a family of parametric models against large nonparametric alternatives. These tests formally check if the bias vector of residuals from parametric fits is negligible by using the adaptive Neyman test and other methods. The testing procedures formalize the traditional model diagnostic tools based on residual plots. We examine the rates of contiguous alternatives that can be detected consistently by the adaptive Neyman test. Applications of the procedures to the partially linear models are thoroughly discussed. Our simulation studies show that the new testing procedures are indeed powerful and omnibus. The power of the proposed tests is comparable to the F-test statistic even in the situations where the F test is known to be suitable and can be far more powerful than the F-test statistic in other situations. An application to testing linear models versus additive models is also discussed.

Journal ArticleDOI
TL;DR: In this article, the authors establish valid Edgeworth expansions for the distribution of smoothed nonparametric spectral estimates, and of studentized versions of linear statistics such as the sample mean.
Abstract: We establish valid Edgeworth expansions for the distribution of smoothed nonparametric spectral estimates, and of studentized versions of linear statistics such as the sample mean, where the studentization employs such a nonparametric spectral estimate. Particular attention is paid to the spectral estimate at zero frequency and, correspondingly, the studentized sample mean, to reflect econometric interest in autocorrelation-consistent or long-run variance estimation. Our main focus is on stationary Gaussian series, though we discuss relaxation of the Gaussianity assumption. Only smoothness conditions on the spectral density that are local to the frequency of interest are imposed. We deduce empirical expansions from our Edgeworth expansions designed to improve on the normal approximation in practice and also deduce a feasible rule of bandwidth choice.

Journal ArticleDOI
TL;DR: In this article, the problem of nonparametric estimation for the distribution function governing the time to occurrence of a recurrent event in the presence of censoring is considered, and the authors derive Nelson-Aalen and Kaplan-Meier-type estimators and establish their respective finite-sample and asymptotic properties.
Abstract: The problem of nonparametric estimation for the distribution function governing the time to occurrence of a recurrent event in the presence of censoring is considered. We derive Nelson–Aalen and Kaplan–Meier-type estimators for the distribution function, and establish their respective finite-sample and asymptotic properties. We allow for random observation periods for each subject under study and explicitly account for the informative sum-quota nature of the data accrual scheme. These allowances complicate technical matters considerably and, in particular, invalidate the direct use of martingale methods. Consistency and weak convergence of our estimators are obtained by extending an approach due to Sellke, who considered a single renewal process (i.e., recurrent events on a single subject) observed over an infinite time period. A useful feature of the present analysis is that strong parallels are drawn to the usual “single-event” setting, providing a natural route toward developing extensions that involve...

Journal ArticleDOI
TL;DR: In this paper, the authors present a default (nonsubjective) version of this analysis, in the sense that recommended choices are provided for the (many) features of the Polya tree process that need to be specified.
Abstract: Testing the fit of data to a parametric model can be done by embedding the parametric model in a nonparametric alternative and computing the Bayes factor of the parametric model to the nonparametric alternative. Doing so by specifying the nonparametric alternative via a Polya tree process is particularly attractive, from both theoretical and methodological perspectives. Among the benefits is a degree of computational simplicity that even allows for robustness analyses to be implemented. Default (nonsubjective) versions of this analysis are developed herein, in the sense that recommended choices are provided for the (many) features of the Polya tree process that need to be specified. Considerable discussion of these features is also provided to assist those who might be interested in subjective choices. A variety of examples involving location–scale models are studied. Finally, it is shown that the resulting procedure can be viewed as a conditional frequentist test, resulting in data-dependent reported err...