scispace - formally typeset
Search or ask a question

Showing papers on "Nonparametric statistics published in 2006"


Proceedings ArticleDOI
25 Jun 2006
TL;DR: A family of probabilistic time series models is developed to analyze the time evolution of topics in large document collections, and dynamic topic models provide a qualitative window into the contents of a large document collection.
Abstract: A family of probabilistic time series models is developed to analyze the time evolution of topics in large document collections. The approach is to use state space models on the natural parameters of the multinomial distributions that represent the topics. Variational approximations based on Kalman filters and nonparametric wavelet regression are developed to carry out approximate posterior inference over the latent topics. In addition to giving quantitative, predictive models of a sequential corpus, dynamic topic models provide a qualitative window into the contents of a large document collection. The models are demonstrated by analyzing the OCR'ed archives of the journal Science from 1880 through 2000.

2,410 citations


Journal ArticleDOI
TL;DR: In this article, the authors developed new methods for analyzing the large sample properties of matching estimators and established a number of new results, such as the following: Matching estimators with replacement with a fixed number of matches are not N 1/2 -consistent.
Abstract: Matching estimators for average treatment effects are widely used in evaluation research despite the fact that their large sample properties have not been established in many cases. The absence of formal results in this area may be partly due to the fact that standard asymptotic expansions do not apply to matching estimators with a fixed number of matches because such estimators are highly nonsmooth functionals of the data. In this article we develop new methods for analyzing the large sample properties of matching estimators and establish a number of new results. We focus on matching with replacement with a fixed number of matches. First, we show that matching estimators are not N 1/2 -consistent in general and describe conditions under which matching estimators do attain N 1/2 -consistency. Second, we show that even in settings where matching estimators are N 1/2 -consistent, simple matching estimators with a fixed number of matches do not attain the semiparametric efficiency bound. Third, we provide a consistent estimator for the large sample variance that does not require consistent nonparametric estimation of unknown functions. Software for implementing these methods is available in Matlab, Stata, and R.

2,207 citations


Book
01 Jan 2006
TL;DR: Nonparametric Econometrics covers all the material necessary to understand and apply nonparametric methods for real-world problems and is the ideal introduction for graduate students and an indispensable resource for researchers.
Abstract: Until now, students and researchers in nonparametric and semiparametric statistics and econometrics have had to turn to the latest journal articles to keep pace with these emerging methods of economic analysis.Nonparametric Econometrics fills a major gap by gathering together the most up-to-date theory and techniques and presenting them in a remarkably straightforward and accessible format. The empirical tests, data, and exercises included in this textbook help make it the ideal introduction for graduate students and an indispensable resource for researchers. Nonparametric and semiparametric methods have attracted a great deal of attention from statisticians in recent decades. While the majority of existing books on the subject operate from the presumption that the underlying data is strictly continuous in nature, more often than not social scientists deal with categorical data--nominal and ordinal--in applied settings. The conventional nonparametric approach to dealing with the presence of discrete variables is acknowledged to be unsatisfactory. This book is tailored to the needs of applied econometricians and social scientists. Qi Li and Jeffrey Racine emphasize nonparametric techniques suited to the rich array of data types--continuous, nominal, and ordinal--within one coherent framework. They also emphasize the properties of nonparametric estimators in the presence of potentially irrelevant variables. Nonparametric Econometrics covers all the material necessary to understand and apply nonparametric methods for real-world problems.

1,372 citations


Proceedings Article
04 Dec 2006
TL;DR: A nonparametric method which directly produces resampling weights without distribution estimation is presented, which works by matching distributions between training and testing sets in feature space.
Abstract: We consider the scenario where training and test data are drawn from different distributions, commonly referred to as sample selection bias. Most algorithms for this setting try to first recover sampling distributions and then make appropriate corrections based on the distribution estimate. We present a nonparametric method which directly produces resampling weights without distribution estimation. Our method works by matching distributions between training and testing sets in feature space. Experimental results demonstrate that our method works well in practice.

1,235 citations


Book
24 Mar 2006
TL;DR: Annotation of SPSS Syntax Files References: Repeated MeasURES ANALYSIS CORRELATION LINEAR REGRESSION FACTOR AnalYSIS RELIABILITY MULTIPLE RE GRESSION STRUCTURal EQUATION MODELING NONPARAMETRIC TESTS APPENDIX.
Abstract: INFERENTIAL STATISTICS AND TEST SELECTION INTRODUCTION TO SPSS MULTIPLE RESPONSE T-TEST FOR INDEPENDENT GROUPS PAIRED-SAMPLES T-TEST ONE-WAY ANALYSIS OF VARIANCE, WITH POST HOC COMPARISONS FACTORIAL ANALYSIS OF VARIANCE GENERAL LINEAR MODEL (GLM) MULTIVARIATE ANALYSIS GENERAL LINEAR MODEL: REPEATED MEASURES ANALYSIS CORRELATION LINEAR REGRESSION FACTOR ANALYSIS RELIABILITY MULTIPLE REGRESSION STRUCTURAL EQUATION MODELING NONPARAMETRIC TESTS APPENDIX: Summary of SPSS Syntax Files References

1,174 citations


Book
01 Jan 2006
TL;DR: In this paper, a well adapted space for functional data is defined and a local weighting of functional variables is proposed for functional nonparametric prediction methodologies based on selected asymptotics.
Abstract: Introduction to functional nonparametric statistics.- Some functional datasets and associated statistical problematics.- What is a well adapted space for functional data?.- Local weighting of functional variables.- Functional nonparametric prediction methodologies.- Some selected asymptotics.- Computational issues.- Nonparametric supervised classification for functional data.- Nonparametric unsupervised classification for functional data.- Mixing, nonparametric and functional statistics.- Some selected asymptotics.- Application to continuous time processes prediction.- Small ball probabilities, semi-metric spaces and nonparametric statistics.- Conclusion and perspectives.

1,041 citations


Posted Content
TL;DR: Nonparametric Econometrics as discussed by the authors is an excellent introduction to nonparametric and semiparametric methods for economic analysis. But it does not address the problem of dealing with the presence of discrete variables.
Abstract: Until recently, students and researchers in nonparametric and semiparametric statistics and econometrics have had to turn to the latest journal articles to keep pace with these emerging methods of economic analysis. Nonparametric Econometrics fills a major gap by gathering together the most up-to-date theory and techniques and presenting them in a remarkably straightforward and accessible format. The empirical tests, data, and exercises included in this textbook help make it the ideal introduction for graduate students and an indispensable resource for researchers. Nonparametric and semiparametric methods have attracted a great deal of attention from statisticians in recent decades. While the majority of existing books on the subject operate from the presumption that the underlying data is strictly continuous in nature, more often than not social scientists deal with categorical data--nominal and ordinal--in applied settings. The conventional nonparametric approach to dealing with the presence of discrete variables is acknowledged to be unsatisfactory. This book is tailored to the needs of applied econometricians and social scientists. Qi Li and Jeffrey Racine emphasize nonparametric techniques suited to the rich array of data types--continuous, nominal, and ordinal--within one coherent framework. They also emphasize the properties of nonparametric estimators in the presence of potentially irrelevant variables. Nonparametric Econometrics covers all the material necessary to understand and apply nonparametric methods for real-world problems.

1,005 citations



Book
01 Jan 2006
TL;DR: Elementary concepts in statistics -- Basic statistics and tables -- ANOVA/MANOVA -- Association rules -- Boosting trees -- Canonical analysis -- CHAID analysis -- Classification and regression trees -- Classification trees -- Cluster analysis -- Correspondence analysis -- Data mining techniques -- Discriminant function analysis -- Distribution fitting -- Experimental design.
Abstract: Elementary concepts in statistics -- Basic statistics and tables -- ANOVA/MANOVA -- Association rules -- Boosting trees -- Canonical analysis -- CHAID analysis -- Classification and regression trees (CART) -- Classification trees -- Cluster analysis -- Correspondence analysis -- Data mining techniques -- Discriminant function analysis -- Distribution fitting -- Experimental design (Industrial DOE) -- Factor analysis and principal components -- General discrimination analysis (GDA) -- General linear models (GLM) -- General regression models (GRM) -- Generalized additive models (GAM) -- Generalized linear/nonlinear models (GLZ) -- Log linear analysis of frequency tables -- Machine learning -- Multivariate adaptive regression splines (MARSplines) -- Multidimensional scaling (MDS) -- Multiple linear regression -- Neural networks -- Nonlinear estimation -- Nonparametric statistics -- Partial least squares (PLS) -- Power analysis -- Process analysis -- Quality control charts -- Reliabilty/item analysis -- Structural equation modeling -- Survival/failure time analysis -- Text mining -- Time series/forecasting -- Variance components and mixed model ANOVA/ANCOVA.

586 citations


Book
06 Oct 2006
TL;DR: In this article, the authors proposed a method to estimate the probability of a cancer event in the presence of competing risks using the Kaplan-Meier method. But the method is not suitable for the case of cancer patients.
Abstract: Preface. Acknowledgements. 1. Introduction. 1.1 Historical notes. 1.2 Defining competing risks. 1.3 Use of the Kaplan-Meier method in the presence of competing risks. 1.4 Testing in the competing risk framework. 1.5 Sample size calculation. 1.6 Examples. 1.6.1 Tamoxifen trial. 1.6.2 Hypoxia study. 1.6.3 Follicular cell lymphoma study. 1.6.4 Bone marrow transplant study. 1.6.5 Hodgkin's disease study. 2. Survival - basic concepts. 2.1 Introduction. 2.2 Definitions and background formulae. 2.2.1 Introduction. 2.2.2 Basic mathematical formulae. 2.2.3 Common parametric distributions. 2.2.4 Censoring and assumptions. 2.3 Estimation and hypothesis testing. 2.3.1 Estimating the hazard and survivor functions. 2.3.2 Nonparametric testing: log-rank and Wilcoxon tests. 2.3.3 Proportional hazards model. 2.4 Software for survival analysis. 2.5 Closing remarks. 3. Competing risks - definitions. 3.1 Recognizing competing risks. 3.1.1 Practical approaches. 3.1.2 Common endpoints in medical research. 3.2 Two mathematical definitions. 3.2.1 Competing risks as bivariate random variable. 3.2.2 Competing risks as latent failure times. 3.3 Fundamental concepts. 3.3.1 Competing risks as bivariate random variable. 3.3.2 Competing risks as latent failure times. 3.3.3 Discussion of the two approaches. 3.4 Closing remarks. 4. Descriptive methods for competing risks data. 4.1 Product-limit estimator and competing risks. 4.2 Cumulative incidence function. 4.2.1 Heuristic estimation of the CIF. 4.2.2 Nonparametric maximum likelihood estimation of the CIF. 4.2.3 Calculating the CIF estimator. 4.2.4 Variance and confidence interval for the CIF estimator. 4.3 Software and examples. 4.3.1 Using R. 4.3.2 Using SAS. 4.4 Closing remarks. 5. Testing a covariate. 5.1 Introduction. 5.2 Testing a covariate. 5.2.1 Gray's method. 5.2.2 Pepe and Mori's method. 5.3 Software and examples. 5.3.1 Using R. 5.3.2 Using SAS. 5.4 Closing remarks. 6. Modelling in the presence of competing risks. 6.1 Introduction. 6.2 Modelling the hazard of the cumulative incidence function. 6.2.1 Theoretical details. 6.2.2 Model-based estimation of the CIF. 6.2.3 Using R. 6.3 Cox model and competing risks. 6.4 Checking the model assumptions. 6.4.1 Proportionality of the cause-specific hazards. 6.4.2 Proportionality of the hazards of the CIF. 6.4.3 Linearity assumption. 6.5 Closing remarks. 7. Calculating the power in the presence of competing risks. 7.1 Introduction. 7.2 Sample size calculation when competing risks are not present. 7.3 Calculating power in the presence of competing risks. 7.3.1 General formulae. 7.3.2 Comparing cause-specific hazards. 7.3.3 Comparing hazards of the subdistributions. 7.3.4 Probability of event when the exponential distribution is not a valid assumption. 7.4 Examples. 7.4.1 Introduction. 7.4.2 Comparing the cause-specific hazard. 7.4.3 Comparing the hazard of the subdistribution. 7.5 Closing remarks. 8. Other issues in competing risks. 8.1 Conditional probability function. 8.1.1 Introduction. 8.1.2 Nonparametric estimation of the CP function. 8.1.3 Variance of the CP function estimator. 8.1.4 Testing a covariate. 8.1.5 Using R. 8.1.6 Using SAS. 8.2 Comparing two types of risk in the same population. 8.2.1 Theoretical background. 8.2.2 Using R. 8.2.3 Discussion. 8.3 Identifiability and testing independence. 8.4 Parametric modelling. 8.4.1 Introduction. 8.4.2 Modelling the marginal distribution. 8.4.3 Modelling the Weibull distribution. 9. Food for thought. Problem 1: Estimation of the probability of the event of interest. Problem 2: Testing a covariate. Problem 3: Comparing the event of interest between two groups when the competing risks are different for each group. Problem 4: Information needed for sample size calculations. Problem 5: The effect of the size of the incidence of competing risks on the coefficient obtained in the model. Problem 6: The KLY test and the non-proportionality of hazards. Problem 7: The KLY and Wilcoxon tests. A: Theoretical background. B: Analysing competing risks data using R and SAS. References. Index.

509 citations


Journal ArticleDOI
TL;DR: In this article, a new class of semiparametric copula-based multivariate dynamic (SCOMDY) models are introduced, which specify the conditional mean and the conditional variance of a multivariate time series parametrically, but specify the multivariate distribution of the standardized innovation semi-parametrically as a parametric copulum evaluated at nonparametric marginal distributions.

Journal ArticleDOI
TL;DR: This article allows the nonparametric distribution to depend on covariates through ordering the random variables building the weights in the stick-breaking representation and derives the correlation between distributions at different covariate values.
Abstract: In this article we propose a new framework for Bayesian nonparametric modeling with continuous covariates. In particular, we allow the nonparametric distribution to depend on covariates through ordering the random variables building the weights in the stick-breaking representation. We focus mostly on the class of random distributions that induces a Dirichlet process at each covariate value. We derive the correlation between distributions at different covariate values and use a point process to implement a practically useful type of ordering. Two main constructions with analytically known correlation structures are proposed. Practical and efficient computational methods are introduced. We apply our framework, through mixtures of these processes, to regression modeling, the modeling of stochastic volatility in time series data, and spatial geostatistical modeling.

Journal ArticleDOI
TL;DR: A nonparametric version of a quantile estimator is presented, which can be obtained by solving a simple quadratic programming problem and uniform convergence statements and bounds on the quantile property of the estimator are provided.
Abstract: In regression, the desired estimate of y|x is not always given by a conditional mean, although this is most common. Sometimes one wants to obtain a good estimate that satisfies the property that a proportion, τ, of y|x, will be below the estimate. For τ = 0.5 this is an estimate of the median. What might be called median regression, is subsumed under the term quantile regression. We present a nonparametric version of a quantile estimator, which can be obtained by solving a simple quadratic programming problem and provide uniform convergence statements and bounds on the quantile property of our estimator. Experimental results show the feasibility of the approach and competitiveness of our method with existing ones. We discuss several types of extensions including an approach to solve the quantile crossing problems, as well as a method to incorporate prior qualitative knowledge such as monotonicity constraints.

Journal ArticleDOI
01 Jul 2006-Genetics
TL;DR: It is argued that standard parametric methods for quantitative genetic analysis cannot handle the multiplicity of potential interactions arising in models with, e.g., single-nucleotide polymorphisms, and that most of the assumptions required for an orthogonal decomposition of variance are violated in artificial and natural populations.
Abstract: Semiparametric procedures for prediction of total genetic value for quantitative traits, which make use of phenotypic and genomic data simultaneously, are presented. The methods focus on the treatment of massive information provided by, e.g., single-nucleotide polymorphisms. It is argued that standard parametric methods for quantitative genetic analysis cannot handle the multiplicity of potential interactions arising in models with, e.g., hundreds of thousands of markers, and that most of the assumptions required for an orthogonal decomposition of variance are violated in artificial and natural populations. This makes nonparametric procedures attractive. Kernel regression and reproducing kernel Hilbert spaces regression procedures are embedded into standard mixed-effects linear models, retaining additive genetic effects under multivariate normality for operational reasons. Inferential procedures are presented, and some extensions are suggested. An example is presented, illustrating the potential of the methodology. Implementations can be carried out after modification of standard software developed by animal breeders for likelihood-based or Bayesian analysis.

Journal ArticleDOI
TL;DR: In this paper, the authors examined the empirical relation between CO2 emissions per capita and GDP per capita during the period 1960-1996, using a panel of 100 countries and found evidence of structural stability of the relationship.

Journal ArticleDOI
TL;DR: This work compares estimated reference curves for height using the penalized likelihood approach of Cole and Green with quantile regression curves based on data used for modern Finnish reference charts, and introduces Quantile specific autoregressive models for unequally spaced measurements.
Abstract: Estimation of reference growth curves for children's height and weight has traditionally relied on normal theory to construct families of quantile curves based on samples from the reference population. Age-specific parametric transformation has been used to significantly broaden the applicability of these normal theory methods. Non-parametric quantile regression methods offer a complementary strategy for estimating conditional quantile functions. We compare estimated reference curves for height using the penalized likelihood approach of Cole and Green (Statistics in Medicine 1992; 11:1305–1319) with quantile regression curves based on data used for modern Finnish reference charts. An advantage of the quantile regression approach is that it is relatively easy to incorporate prior growth and other covariates into the analysis of longitudinal growth data. Quantile specific autoregressive models for unequally spaced measurements are introduced and their application to diagnostic screening is illustrated. Copyright © 2005 John Wiley & Sons, Ltd.


Journal ArticleDOI
TL;DR: Preliminary results did not indicate that digital models would cause an orthodontist to make a different diagnosis of malocclusion compared with plaster models; digital models are not a compromised choice for treatment planning or diagnosis.

BookDOI
20 Jun 2006
TL;DR: In this paper, the authors present an overview of statistical matching in the context of estimating uncertainty in the data set, including the following: 1.1 The missing data mechanism in the statistical matching problem. 2.2 Accuracy of the estimator applied on the synthetic data set.
Abstract: Preface. 1 The Statistical Matching Problem. 1.1 Introduction. 1.2 The Statistical Framework. 1.3 The Missing Data Mechanism in the Statistical Matching Problem. 1.4 Accuracy of a Statistical Matching Procedure. 1.4.1 Model assumptions. 1.4.2 Accuracy of the estimator. 1.4.3 Representativeness of the synthetic file. 1.4.4 Accuracy of estimators applied on the synthetic data set. 1.5 Outline of the Book. 2 The Conditional Independence Assumption. 2.1 The Macro Approach in a Parametric Setting. 2.1.1 Univariate normal distributions case. 2.1.2 The multinormal case. 2.1.3 The multinomial case. 2.2 The Micro (Predictive) Approach in the Parametric Framework. 2.2.1 Conditional mean matching. 2.2.2 Draws based on conditional predictive distributions. 2.2.3 Representativeness of the predicted files. 2.3 Nonparametric Macro Methods. 2.4 The Nonparametric Micro Approach. 2.4.1 Random hot deck. 2.4.2 Rank hot deck. 2.4.3 Distance hot deck. 2.4.4 The matching noise. 2.5 Mixed Methods. 2.5.1 Continuous variables. 2.5.2 Categorical variables. 2.6 Comparison of Some Statistical Matching Procedures under the CIA. 2.7 The Bayesian Approach. 2.8 Other IdentifiableModels. 2.8.1 The pairwise independence assumption. 2.8.2 Finite mixture models. 3 Auxiliary Information. 3.1 Different Kinds of Auxiliary Information. 3.2 Parametric Macro Methods. 3.2.1 The use of a complete third file. 3.2.2 The use of an incomplete third file. 3.2.3 The use of information on inestimable parameters. 3.2.4 The multinormal case. 3.2.5 Comparison of different regression parameter estimators through simulation. 3.2.6 The multinomial case. 3.3 Parametric Predictive Approaches. 3.4 Nonparametric Macro Methods. 3.5 The Nonparametric Micro Approach with Auxiliary Information. 3.6 Mixed Methods. 3.6.1 Continuous variables. 3.6.2 Comparison between some mixed methods. 3.6.3 Categorical variables. 3.7 Categorical Constrained Techniques. 3.7.1 Auxiliary micro information and categorical constraints. 3.7.2 Auxiliary information in the form of categorical constraints. 3.8 The Bayesian Approach. 4 Uncertainty in Statistical Matching. 4.1 Introduction. 4.2 A Formal Definition of Uncertainty. 4.3 Measures of Uncertainty. 4.3.1 Uncertainty in the normal case. 4.3.2 Uncertainty in the multinomial case. 4.4 Estimation of Uncertainty. 4.4.1 Maximum likelihood estimation of uncertainty in the multinormal case. 4.4.2 Maximum likelihood estimation of uncertainty in the multinomial case. 4.5 Reduction of Uncertainty: Use of Parameter Constraints. 4.5.1 The multinomial case. 4.6 Further Aspects of Maximum Likelihood Estimation of Uncertainty. 4.7 An Example with Real Data. 4.8 Other Approaches to the Assessment of Uncertainty. 4.8.1 The consistent approach. 4.8.2 The multiple imputation approach. 4.8.3 The de Finetti coherence approach. 5 Statistical Matching and Finite Populations. 5.1 Matching Two Archives. 5.1.1 Definition of the CIA. 5.2 Statistical Matching and Sampling from a Finite Population. 5.3 Parametric Methods under the CIA. 5.3.1 The macro approach when the CIA holds. 5.3.2 The predictive approach. 5.4 Parametric Methods when Auxiliary Information is Available. 5.4.1 The macro approach. 5.4.2 The predictive approach. 5.5 File Concatenation. 5.6 Nonparametric Methods. 6 Issues in Preparing for Statistical Matching. 6.1 Reconciliation of Concepts and Definitions of Two Sources. 6.1.1 Reconciliation of biased sources. 6.1.2 Reconciliation of inconsistent definitions. 6.2 How to Choose the Matching Variables. 7 Applications. 7.1 Introduction. 7.2 Case Study: The Social Accounting Matrix. 7.2.1 Harmonization step. 7.2.2 Modelling the social accounting matrix. 7.2.3 Choosing the matching variables. 7.2.4 The SAM under the CIA. 7.2.5 The SAM and auxiliary information. 7.2.6 Assessment of uncertainty for the SAM. A Statistical Methods for Partially Observed Data. A.1 Maximum Likelihood Estimation with Missing Data. A.1.1 Missing data mechanisms. A.1.2 Maximum likelihood and ignorable nonresponse. A.2 Bayesian Inference withMissing Data. B Loglinear Models. B.1 Maximum Likelihood Estimation of the Parameters. C Distance Functions. D Finite Population Sampling. E R Code. E.1 The R Environment. E.2 R Code for Nonparametric Methods. E.3 R Code for Parametric and Mixed Methods. E.4 R Code for the Study of Uncertainty. E.5 Other R Functions. References. Index.

10 Feb 2006
TL;DR: This article proposed estimators of unconditional distribution functions in the presence of covariates based on the estimation of the conditional distribution by (parametric or non-parametric) quantile regression.
Abstract: This paper proposes estimators of unconditional distribution functions in the presence of covariates. The methods are based on the estimation of the conditional distribution by (parametric or nonparametric) quantile regression. The conditional distribution is then integrated over the range of the covariates, allowing for the estimation of counterfactual distributions. In the parametric settings, we propose an extension of the Oaxaca (1973) / Blinder (1973) decomposition of means to the full distribution. In the nonparametric setting, we develop an efficient local-linear regression estimator for quantile treatment effects. We show root n consistency and asymptotic normality of the estimators and present analytical estimators of their variance. Monte-Carlo simulations show that the procedures perform well in finite samples. An application to the black-white wage gap illustrates the usefulness of the estimators.

Journal ArticleDOI
TL;DR: P(X > Y) is proposed as an alternative index, its correspondence with well‐known non‐parametric statistics, compare it to the standardized mean difference index, and illustrate with clinical data.
Abstract: Effect sizes (ES) tell the magnitude of the difference between treatments and, ideally, should tell clinicians how likely their patients will benefit from the treatment. Currently used ES are expressed in statistical rather than in clinically useful terms and may not give clinicians the appropriate information. We restrict our discussion to studies with two groups: one with n patients receiving a new treatment and the other with m patients receiving the usual or no treatment. The standardized mean difference (e.g. Cohen's d) is a well-known index for continuous outcomes. There is some intuitive value to d, but measuring improvement in standard deviations (SD) is a statistical concept that may not help a clinician. How much improvement is a half SD? A more intuitive and simple-to-calculate ES is the probability that the response of a patient given the new treatment (X) is better than the one for a randomly chosen patient given the old or no treatment (Y) (i.e. P(X > Y), larger values meaning better outcomes). This probability has an immediate identity with the area under the curve (AUC) measure in procedures for receiver operator characteristic (ROC) curve comparing responses to two treatments. It also can be easily calculated from the Mann–Whitney U, Wilcoxon, or Kendall τ statistics. We describe the characteristics of an ideal ES. We propose P(X > Y) as an alternative index, summarize its correspondence with well-known non-parametric statistics, compare it to the standardized mean difference index, and illustrate with clinical data. Copyright © 2005 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This article investigated the predictive performance of various classes of value-at-risk (VaR) models in several dimensions, including unfiltered versus filtered VaR models, parametric versus nonparametric distributions, conventional versus extreme value distributions, and quantile regression versus inverting the conditional distribution function.
Abstract: We investigate the predictive performance of various classes of value-at-risk (VaR) models in several dimensions—unfiltered versus filtered VaR models, parametric versus nonparametric distributions, conventional versus extreme value distributions, and quantile regression versus inverting the conditional distribution function. By using the reality check test of White (2000), we compare the predictive power of alternative VaR models in terms of the empirical coverage probability and the predictive quantile loss for the stock markets of five Asian economies that suffered from the 1997–1998 financial crisis. The results based on these two criteria are largely compatible and indicate some empirical regularities of risk forecasts. The Riskmetrics model behaves reasonably well in tranquil periods, while some extreme value theory (EVT)-based models do better in the crisis period. Filtering often appears to be useful for some models, particularly for the EVT models, though it could be harmful for some other models. The CaViaR quantile regression models of Engle and Manganelli (2004) have shown some success in predicting the VaR risk measure for various periods, generally more stable than those that invert a distribution function. Overall, the forecasting performance of the VaR models considered varies over the three periods before, during and after the crisis. Copyright © 2006 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a sieve maximum likelihood estimation procedure for a broad class of semiparametric multivariate distributions, characterized by a parametric copula function evaluated at nonparametric marginal distributions.
Abstract: We propose a sieve maximum likelihood estimation procedure for a broad class of semiparametric multivariate distributions. A joint distribution in this class is characterized by a parametric copula function evaluated at nonparametric marginal distributions. This class of distributions has gained popularity in diverse fields due to its flexibility in separately modeling the dependence structure and the marginal behaviors of a multivariate random variable, and its circumvention of the “curse of dimensionality” associated with purely nonparametric multivariate distributions. We show that the plug-in sieve maximum likelihood estimators (MLEs) of all smooth functionals, including the finite-dimensional copula parameters and the unknown marginal distributions, are semiparametrically efficient, and that their asymptotic variances can be estimated consistently. Moreover, prior restrictions on the marginal distributions can be easily incorporated into the sieve maximum likelihood estimation procedure to achieve fu...

Book
01 Jan 2006
TL;DR: In this paper, the AMOS program was used to conduct a meta-analysis study on the correlation between different measures of central tendency and variance in the Pearson Product-Moment Correlation.
Abstract: Preface Section One: Introduction to Statistical Analyses 1. Using Statistics to Conduct Quantitative Research A World of Statistics Why Do Quantitative Research? Typical Steps Involved in Quantitative Research 2. Collecting Data on Variables Variables and Hypotheses Measurement of Variables Sampling Section Two: Descriptive Statistics 3. Central Tendency Doing a Study and Reporting Descriptive Information Typical Measures of Central Tendency Relations among Mean Median and Mode 4. Looking at Variability and Dispersion Assessing Dispersion The Relationship Between Measures of Central Tendency and Variability Examining Distributions 5. Correlations The Notion of Correlation Elements of the Correlation Computing the Pearson Product-Moment Correlation Matters Affecting Correlations Methods of Correlations Alternative Forms of Association 6. Ensuring Reliability and Validity The Notion of Measurement Acceptability How to Do a study of Measurement Adequacy Reliability Validity The Relation of Validity to Reliability Section Three: Inferential Statistics 7. Statistical Significance Hypothesis Testing when Comparing Two Means Doing a Study that Tests a Hypothesis of Differences Between Means Assumptions in Parametric Hypothesis Testing Comparing Sample and Population Means Comparing the Means of Two Sample Groups: The Two-Sample t Test Comparing Means Differences of Paired Scores: The Paired Difference t Assessing Power 8. Comparing More than Two Means: One-Way Analysis of Variance Hypothesis Testing for More than Two Means The Analysis of Variance Hypothesis Test What after ANOVA? Multiple Comparison Tests Extensions of Analysis of Variance 9. Factorial Analysis of Variance Doing a Study that Involves More than One Independent Variable Types of Effect to Test Computing the Fixed-Effect ANOVA Random and Mixed-Effects Designs Section Four: Nonparametric Tests 10. Nonparametric Tests for Categorical Variables The Notion of "Distribution-Free" Statistics Conducting a Study that Requires Nonparametric Tests of Categorical Data The Chi-Square Test Alternatives to Chi-Square for Frequency Data 11. Nonparametric Tests for Rank Order Dependent Variables Doing a Study Involving Ordinal Dependent Variables Comparing Ranks of One Group to Presumed Populations Characteristics: Analogous Tests to One-Sample t Tests Comparing Ranks from Two Sample Groups Comparing Ranks from More than Two Sample Groups: Analogous Tests to One-Way ANOVA Section Five: Advanced Statistical Applications 12. Meta-Analysis Meta-Analysis: An Alternative to Artistic Literature Reviews Conducting the Meta-Analysis Study Using Computer Techniques to Perform Meta-Analysis 13. Multiple Regression Correlation Contrasting Bivariate Correlation and Multiple Regression Correlation Components of Multiple Correlations How to Do a Multiple Regression Correlation Study 14. Extensions of Multiple Regression Correlation Using Categorical Predictors Contrasting Full and Reduced Models: Hierarchical Analysis Interaction Effects Examining Nonlinear Effects 15. Exploratory Factor Analysis Forms of Factor Analysis The Notion of Multivariate Analyses Exploratory Factor Analysis 16. Confirmatory Factor Analysis Through the AMOS Program The Notion of Confirmatory Factor Analysis Using the AMOS Program for Confirmatory Factor Analysis 17. Modeling Communication Behavior The Goals of Modeling How to Do a Modeling Study Path Models Using the AMOS Program Appendix A: Using Excel XP to Analyze Data Getting Ready to Run Statistics With Excel Handling Data Using the Menu Bar Toolbars How to Run Statistics From the Analysis ToolPak Using Functions Appendix B: Using SPSS 12 for Windows How to enter and Screen Your Own Data in SPSS How to Enter Data From a Word Processor How to Create Indexes From Scales Commands in the SPSS System Dealing With Output Alternative Editing Environments Appendix C: Tables References Index About the Author

Journal ArticleDOI
TL;DR: In this article, a semi-functional partially linear model is proposed to predict real-valued response variables in the situation where some among the explanatory variables are functional, and some asymptotic results with rates of convergence are given.

Posted Content
TL;DR: In this article, a new asymptotic theory for local time density estimation for a general class of functionals of integrated time series is proposed for non-linear cointegrating regression involving non-stationary time series.
Abstract: We provide a new asymptotic theory for local time density estimation for a general class of functionals of integrated time series. This result provides a convenient basis for developing an asymptotic theory for nonparametric cointegrating regression and autoregression. Our treatment directly involves the density function of the processes under consideration and avoids Fourier integral representations and Markov process theory which have been used in earlier research on this type of problem. The approach provides results of wide applicability to important practical cases and involves rather simple derivations that should make the limit theory more accessible and useable in econometric applications. Our main result is applied to offer an alternative development of the asymptotic theory for non-parametric estimation of a non-linear cointegrating regression involving non-stationary time series. In place of the framework of null recurrent Markov chains as developed in recent work of Karlsen, Myklebust and Tjostheim (2007), the direct local time density argument used here more closely resembles conventional nonparametric arguments, making the conditions simpler and more easily verified.

Journal ArticleDOI
TL;DR: Simulations show that the proposed test has significant power advantages over conventional kernel tests which rely upon frequency-based nonparametric estimators that require sample splitting to handle the presence of discrete regressors.
Abstract: In this paper we propose a nonparametric kernel-based model specification test that can be used when the regression model contains both discrete and continuous regressors. We employ discrete variable kernel functions and we smooth both the discrete and continuous regressors using least squares cross-validation methods. The test statistic is shown to have an asymptotic normal null distribution. We also prove the validity of using the wild bootstrap method to approximate the null distribution of the test statistic, the bootstrap being our preferred method for obtaining the null distribution in practice. Simulations show that the proposed test has significant power advantages over conventional kernel tests which rely upon frequency-based nonparametric estimators that require sample splitting to handle the presence of discrete regressors.

Journal ArticleDOI
TL;DR: A simple nonparametric classifier based on the local mean vectors is proposed that is compared with the 1-NN, k-nn, Euclidean distance, Parzen, and artificial neural network (ANN) classifiers in terms of the error rate on the unknown patterns.

Journal ArticleDOI
TL;DR: In this paper, a test for the significance of categorical predictors in nonparametric regression models is proposed, which employs cross-validated smoothing parameter selection while the null distribution of the test is obtained via bootstrapping.
Abstract: In this paper we propose a test for the significance of categorical predictors in nonparametric regression models. The test is fully data-driven and employs cross-validated smoothing parameter selection while the null distribution of the test is obtained via bootstrapping. The proposed approach allows applied researchers to test hypotheses concerning categorical variables in a fully nonparametric and robust framework, thereby deflecting potential criticism that a particular finding is driven by an arbitrary parametric specification. Simulations reveal that the test performs well, having significantly better power than a conventional frequency-based nonparametric test. The test is applied to determine whether OECD and non-OECD countries follow the same growth rate model or not. Our test suggests that OECD and non-OECD countries follow different growth rate models, while the tests based on a popular parametric specification and the conventional frequency-based nonparametric estimation method fail to detect ...

Journal ArticleDOI
TL;DR: A unified and efficient nonparametric hypothesis testing procedure that can easily take into account correlation within subjects and deal directly with both continuous and discrete response longitudinal data under the framework of generalized linear models is proposed.
Abstract: Nonparametric smoothing methods are used to model longitudinal data, but the challenge remains to incorporate correlation into nonparametric estimation procedures. In this article, we propose an efficient estimation procedure for varying-coefficient models for longitudinal data. The proposed procedure can easily take into account correlation within subjects and deal directly with both continuous and discrete response longitudinal data under the framework of generalized linear models. The proposed approach yields a more efficient estimator than the generalized estimation equation approach when the working correlation is misspecified. For varying-coefficient models, it is often of interest to test whether coefficient functions are time varying or time invariant. We propose a unified and efficient nonparametric hypothesis testing procedure, and further demonstrate that the resulting test statistics have an asymptotic chi-squared distribution. In addition, the goodness-of-fit test is applied to test whether the model assumption is satisfied. The corresponding test is also useful for choosing basis functions and the number of knots for regression spline models in conjunction with the model selection criterion. We evaluate the finite sample performance of the proposed procedures with Monte Carlo simulation studies. The proposed methodology is illustrated by the analysis of an acquired immune deficiency syndrome (AIDS) data set.