scispace - formally typeset
Search or ask a question

Showing papers on "Nonparametric statistics published in 1983"


Journal ArticleDOI
Naihua Duan1
TL;DR: The smearing estimate as discussed by the authors is a nonparametric estimate of the expected response on the untransformed scale after fitting a linear regression model on a transformed scale, which is consistent under mild regularity conditions, and usually attains high efficiency relative to parametric estimates.
Abstract: The smearing estimate is proposed as a nonparametric estimate of the expected response on the untransformed scale after fitting a linear regression model on a transformed scale. The estimate is consistent under mild regularity conditions, and usually attains high efficiency relative to parametric estimates. It can be viewed as a low-premium insurance policy against departures from parametric distributional assumptions. A real-world example of predicting medical expenditures shows that the smearing estimate can outperform parametric estimates even when the parametric assumption is nearly satisfied.

2,093 citations


Book
04 Apr 1983
TL;DR: In this article, the authors present a statistical approach for estimating the expected value and variance of a Probability Distribution, which is a measure of the probability of a given sample having a given distribution.
Abstract: Preface to the Third Edition. Preface to the Second Edition. Preface to the First Edition. 1. The Role of Statistics. 1.1 The Basic Statistical Procedure. 1.2 The Scientific Method. 1.3 Experimental Data and Survey Data. 1.4 Computer Usage. Review Exercises. Selected Readings. 2. Populations, Samples, and Probability Distributions. 2.1 Populations and Samples. 2.2 Random Sampling. 2.3 Levels of Measurement. 2.4 Random Variables and Probability Distributions. 2.5 Expected Value and Variance of a Probability Distribution. Review Exercises. Selected Readings. 3. Binomial Distributions. 3.1 The Nature of Binomial Distributions. 3.2 Testing Hypotheses. 3.3 Estimation. 3.4 Nonparametric Statistics: Median Test. Review Exercises. Selected Readings. 4. Poisson Distributions. 4.1 The Nature of Poisson Distributions. 4.2 Testing Hypotheses. 4.3 Estimation. 4.4 Poisson Distributions and Binomial Distributions. Review Exercises. Selected Readings. 5. Chi-Square Distributions. 5.1 The Nature of Chi-Square Distributions. 5.2 Goodness-of-Fit Tests. 5.3 Contingency Table Analysis. 5.4 Relative Risks and Odds Ratios. 5.5 Nonparametric Statistics: Median Test for Several Samples. Review Exercises. Selected Readings. 6. Sampling Distribution of Averages. 6.1 Population Mean and Sample Average. 6.2 Population Variance and Sample Variance. 6.3 The Mean and Variance of the Sampling Distribution of Averages. 6.4 Sampling Without Replacement. Review Exercises. 7. Normal Distributions. 7.1 The Standard Normal Distribution. 7.2 Inference From a Single Observation. 7.3 The Central Limit Theorem. 7.4 Inferences About a Population Mean and Variance. 7.5 Using a Normal Distribution to Approximate Other Distributions. 7.6 Nonparametric Statistics: A Test Based on Ranks. Review Exercises. Selected Readings. 8. Student's t Distribution. 8.1 The Nature of t Distributions. 8.2 Inference About a Single Mean. 8.3 Inference About Two Means. 8.4 Inference About Two Variances. 8.5 Nonparametric Statistics: Matched-Pair and Two-Sample Rank Tests. Review Exercises. Selected Readings. 9. Distributions of Two Variables. 9.1 Simple Linear Regression. 9.2 Model Testing. 9.3 Inferences Related to Regression. 9.4 Correlation. 9.5 Nonparametric Statistics: Rank Correlation. 9.6 Computer Usage. 9.7 Estimating Only One Linear Trend Parameter. Review Exercises. Selected Readings. 10. Techniques for One-way Analysis of Variance. 10.1 The Additive Model. 10.2 One-Way Analysis-of-Variance Procedure. 10.3 Multiple-Comparison Procedures. 10.4 One-Degree-of-Freedom Comparisons. 10.5 Estimation. 10.6 Bonferroni Procedures. 10.7 Nonparametric Statistics: Kruskal-Wallis ANOVA for Ranks. Review Exercises. Selected Readings. 11. The Analysis-of-Variance Model. 11.1 Random Effects and Fixed Effects. 11.2 Testing the Assumptions for ANOVA. 11.3 Transformations. Review Exercises. Selected Readings. 12. Other Analysis-of-Variance Designs. 12.1 Nested Design. 12.2 Randomized Complete Block Design. 12.3 Latin Square Design. 12.4 a xb Factorial Design. 12.5 a xb xc Factorial Design. 12.6 Split-Plot Design. 12.7 Split Plot with Repeated Measures. Review Exercises. Selected Readings. 13. Analysis of Covariance. 13.1 Combining Regression with ANOVA. 13.2 One-Way Analysis of Covariance. 13.3 Testing the Assumptions for Analysis of Covariance. 13.4 Multiple-Comparison Procedures. Review Exercises. Selected Readings. 14. Multiple Regression and Correlation. 14.1 Matrix Procedures. 14.2 ANOVA Procedures for Multiple Regression and Correlation. 14.3 Inferences About Effects of Independent Variables. 14.4 Computer Usage. 14.5 Model Fitting. 14.6 Logarithmic Transformations. 14.7 Polynomial Regression. 14.8 Logistic Regression. Review Exercises. Selected Readings. Appendix of Useful Tables. Answers to Most Odd-Numbered Exercises and All Review Exercises. Index.

1,461 citations


Journal ArticleDOI
TL;DR: This article used a split-sample analysis and found that a model that more closely approximates distributional assumptions and uses a nonparametric retransformation factor performs better in terms of mean squared forecast error.
Abstract: We have tested alternative models of the demand for medical care using experimental data. The estimated response of demand to insurance plan is sensitive to the model used. We therefore use a split-sample analysis and find that a model that more closely approximates distributional assumptions and uses a nonparametric retransformation factor performs better in terms of mean squared forecast error. Simpler models are inferior either because they are not robust to outliers (e.g., ANOVA, ANOCOVA), or because they are inconsistent when strong distributional assumptions are violated (e.g., a two-parameter Box-Cox transformation).

1,303 citations


Journal ArticleDOI
TL;DR: In this paper, the data are used through their rank order, allowing a nonparametric approach to the data bivariate distribution, which allows a risk-qualified estimation of local and global spatial distributions.
Abstract: The indicator approach, whereby the data are used through their rank order, allows a nonparametric approach to the data bivariate distribution. Such rich structural information allows a nonparametric risk-qualified, estimation of local and global spatial distributions.

924 citations


Book
01 Jan 1983

800 citations


Journal ArticleDOI
TL;DR: In this article, lower bounds for estimation of the parameters of models with both parametric and nonparametric components are given in the form of representation theorems (for regular estimates) and asymptotic minimax bounds.
Abstract: Asymptotic lower bounds for estimation of the parameters of models with both parametric and nonparametric components are given in the form of representation theorems (for regular estimates) and asymptotic minimax bounds. The methods used involve: (i) the notion of a "Hellinger-differentiable (root-) density", where part of the differentiation is with respect to the nonparametric part of the model, to obtain appropriate scores; and (ii) calculation of the "effective score" for the real or vector (finite-dimensional) parameter of interest as that component of the score function orthogonal to all nuisance parameter "scores" (perhaps infinite-dimensional). The resulting asymptotic information for estimation of the parametric component of the model is just (4 times) the squared $L^2$-norm of the "effective score". A corollary of these results is a simple necessary condition for "adaptive estimation": adaptation is possible only if the scores for the parameter of interest are orthogonal to the scores for the nuisance function or nonparametric part of the model. Examples considered include the one-sample location model with and without symmetry, mixture models, the two-sample shift model, and Cox's proportional hazards model.

406 citations


Journal ArticleDOI
TL;DR: In this paper, splines are presented as a nonparametric function estimating technique, and the method of cross-validation for choosing the smoothing parameter is discussed and the general multivariate regression/surface estimation problem is addressed.
Abstract: This is a survey article that attempts to synthesize a broad variety of work on splines in statistics. Splines are presented as a nonparametric function estimating technique. After a general introduction to the theory of interpolating and smoothing splines, splines are treated in the nonparametric regression setting. The method of cross-validation for choosing the smoothing parameter is discussed and the general multivariate regression/surface estimation problem is addressed. An extensive discussion of splines as nonparametric density estimators is followed by a discussion of their role in time series analysis. A comparison of the spline and isotonic regression methodologies leads to a formulation of a hybrid estimator. The closing section provides a brief overall summary and formulates a number of open/unsolved problems relating to splines in statistics.

350 citations


Journal ArticleDOI
TL;DR: Conover and Conover as mentioned in this paper presented practical nonparametric statistics, 2nd edition, for probability and mathematical statistics, and published it in the Wiley Series in Probability and Mathematical Statistics.
Abstract: Practical Nonparametric Statistics, 2nd edition. By W. J. Conover. New York and Chichester, Wiley, 1980. xiv, 493 p. 23.5 cm. £26.60 (hardbound), £13.90 (clothbound). (Wiley Series in Probability and Mathematical Statistics.)

289 citations


Journal ArticleDOI
TL;DR: In this article, a nonparametric method of discriminant analysis is proposed based on non-parametric extensions of commonly used scatter matrices for non-Gaussian data sets and a procedure is proposed to test the structural similarity of two distributions.
Abstract: A nonparametric method of discriminant analysis is proposed. It is based on nonparametric extensions of commonly used scatter matrices. Two advantages result from the use of the proposed nonparametric scatter matrices. First, they are generally of full rank. This provides the ability to specify the number of extracted features desired. This is in contrast to parametric discriminant analysis, which for an L class problem typically can determine at most L 1 features. Second, the nonparametric nature of the scatter matrices allows the procedure to work well even for non-Gaussian data sets. Using the same basic framework, a procedure is proposed to test the structural similarity of two distributions. The procedure works in high-dimensional space. It specifies a linear decomposition of the original data space in which a relative indication of dissimilarity along each new basis vector is provided. The nonparametric scatter matrices are also used to derive a clustering procedure, which is recognized as a k-nearest neighbor version of the nonparametric valley seeking algorithm. The form which results provides a unified view of the parametric nearest mean reclassification algorithm and the nonparametric valley seeking algorithm.

232 citations


Book
01 Jan 1983
TL;DR: Part One: Fundamental Concepts of Probability and Statistics Part Two: Statistical Analysis I: Basic Concepts Part Three: Statistical analysis II: Evaluations with Several Variables Part Four: Specialized Topics in Statistics.
Abstract: Part One: Fundamental Concepts of Probability and Statistics Part Two: Statistical Analysis I: Basic Concepts Part Three: Statistical Analysis II: Evaluations with Several Variables Part Four: Specialized Topics in Statistics.

192 citations


Journal ArticleDOI
TL;DR: Two nonparametric methods and their adaptations to bioavailability ratios are reviewed, one based on Wilcoxon's signed rank test (Tukey), and the other on Pitman's permutation test.
Abstract: For a two-way cross-over design, which appears to be the most common experimental design in bioavailability studies, 95%-confidence limits for expected bioavailability can be obtained by classical analysis of variance (ANOVA). If symmetry of the confidence interval is desired about zero (differences) or unity (ratios) rather than about the corresponding point estimator, Westlake's modification can be used. Two nonparametric methods and their adaptations to bioavailability ratios are reviewed, one based on Wilcoxon's signed rank test (Tukey), and the other on Pitman's permutation test. The necessary assumptions and the merits of these procedures are discussed. The methods are illustrated by an example of a comparative bioavailability study. A FORTRAN program facilitating the procedures is available from the authors upon request.

Journal ArticleDOI
TL;DR: Polynomial time algorithms are presented for finding the permutation distribution of any statistic that is a linear combination of some function of either the original observations or the ranks, including the original Fisher two-sample location statistic.
Abstract: Polynomial time algorithms are presented for finding the permutation distribution of any statistic that is a linear combination of some function of either the original observations or the ranks. This class of statistics includes the original Fisher two-sample location statistic and such common nonparametric statistics as the Wilcoxon, Ansari-Bradley, Savage, and many others. The algorithms are presented for the two-sample problem and it is shown how to extend them to the multisample problem—for example, to find the distribution of the Kruskal-Wallis and other extensions of the Wilcoxon—and to the single-sample situation. Stratification, ties, and censored observations are also easily handled by the algorithms. The algorithms require polynomial time as opposed to complete enumeration algorithms, which require exponential time. This savings is effected by first calculating and then inverting the characteristic function of the statistic.

Journal ArticleDOI
TL;DR: It is shown how a multivariate empirical survivor function must be constructed in order to be considered a (nonparametric) maximum likelihood estimate of the underlying survivor function.
Abstract: This paper presents examples of situations in which one wishes to estimate a multivariate distribution from data that may be right-censored. A distinction is made between what we term 'homogeneous' and 'heterogeneous' censoring. It is shown how a multivariate empirical survivor function must be constructed in order to be considered a (nonparametric) maximum likelihood estimate of the underlying survivor function. A closed-form solution, similar to the product-limit estimate of Kaplan and Meier, is possible with homogeneous censoring, but an iterative method, such as the EM algorithm, is required with heterogeneous censoring. An example is given in which an anomaly is produced if censored multivariate data are analyzed as a series of univariate variables; this anomaly is shown to disappear if the methods of this paper are used.

Journal ArticleDOI
TL;DR: In this article, point and interval estimators of the ratio of two scale parameters are given for arbitrarily right-censored data based on the idea of Hodges and Lehmann (1963).
Abstract: Nonparametric point and interval estimators of the ratio of two scale parameters are given for arbitrarily right-censored data based on the idea of Hodges and Lehmann (1963). These estimators are defined in terms of rank test statistics for testing the equality of two survival distributions. The asymptotic properties and efficiencies of estimators corresponding to the tests studied by Gill (1980) are investigated.

Journal ArticleDOI
TL;DR: In this paper, the authors present a range of graphical methods available for use with nonparametric procedures and provide examples of the use of many of the methods under the broad groupings: two-sample procedures, one-sample procedure, association and regression procedures, and miscellaneous procedures.
Abstract: (ISI) are collaborating with JSTOR to digitize, preserve and extend access to International Statistical Review / Revue Internationale de Statistique. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. Summary This paper reviews the range of graphical methods available for use with nonparametric procedures, and provides examples of the use of many of the methods under the broad groupings: two-sample procedures, one-sample procedures, association and regression procedures, and miscellaneous procedures. An annotated bibliography is also provided.


Journal ArticleDOI
TL;DR: An empirical approach to studying the nature of the distances between scale points was developed, using responses to a clinical performance evaluation instrument that uses a four-point behaviorally-anchored scale.
Abstract: The analysis of data collectedfrom behavioral assessment instruments is typically conducted using parametric statistics, with little or no reference given to the underlying nature of the scale being used. If the nature of the distances between the scale points is not understood, the concept of normality of the distribution becomes clouded. An empirical approach to studying thisproblem was developed, using responses to a clinical performance evaluation instrument that uses a four-point behaviorally-anchored scale. Various combinations of nonlinear transformations were applied to the evaluation responses. The factorial structure of the fifteen items constituting the evaluation form was minimally affected by the transformations, suggesting that parametric statistics can be applied to behaviorally-anchored rating scales.

Journal ArticleDOI
01 Mar 1983
TL;DR: One transformation that get around many assumptions about distributions, is the rank distribution, in this, one or more variables are replaced by their ranks.
Abstract: One transformation that get us around many assumptions about distributions, is the rank distribution. In this, one or more variables are replaced by their ranks. The rank transformation simply assigns the value 1 to the smallest observed value, 2 to the next smallest, etc. Of course in practice there are often ties, and some sort of intelligent tie-breaker needs to be employed. Usually this is to assign both values their average rank.


Journal ArticleDOI
TL;DR: In this paper, nonparametric estimates of a nonstationary regression function E{Yn|Xn=x}=tn(x)R(x), wheretn and R are unknown functions, are investigated and their asymptotic properties are investigated.
Abstract: Let (X1,Y1), (X2,Y2),... be independent pairs of random variables according to the modelYn=tn(Xn)R(Xn)+Zn,n=1,2,..., wheretn andR are unknown functions.Zn's are i.i.d. random variables with zero mean and finite variance. The marginal density ofXn is independent ofn. In the paper nonparametric estimates of a nonstationary regression function E{Yn|Xn=x}=tn(x)R(x) are proposed and their asymptotic properties are investigated.


Book ChapterDOI
01 Jan 1983
TL;DR: In this paper, the authors discuss the optimal uniform rate of convergence for nonparametric estimators of a density function or its derivatives, which is defined as the lower and upper bound of the convergence rate.
Abstract: Publisher Summary This chapter discusses optimal uniform rate of convergence for the nonparametric estimators of a density function or its derivatives. It describes the optimal uniform rate of convergence of an arbitrary estimator of T(f) for various choices of F. It is called the optimal rate of convergence, if it is both a lower and an achievable rate of convergence.

Journal ArticleDOI
TL;DR: In this paper, sufficient conditions for the variable kernel estimator to be strongly consistent are presented, and the estimator which allows for variable bandwidth was found to have a superior performance.
Abstract: In a recent paper (Tanner and Wong, 1983b), a family of data-based nonparametric hazard estimators was introduced. Several of these estimators were studied in an extensive simulation experiment. The estimator which allows for variable bandwidth was found to have a superior performance. In this note, sufficient conditions for the variable kernel estimator to be strongly consistent are presented.

Journal ArticleDOI
TL;DR: The apparent error rates of the kernel method are found to be consistently less than those of the classical method, and when the true error rates are estimated either by applying the classifiers to independent test sets, or by the leaving-one-out method from the design sets, no significant difference is discernible between the two types of classifier.
Abstract: The results of applying classical linear discriminant analysis and kernel discriminant analysis to several real sets of multivariate binary data are presented. Classical discriminant analysis is intrinsically parametric and is usually presented as being well-suited to continuous variables; it is also well-known to be optimal when the (two) classes have normal distributions with identical covariance matrices. The kernel method, on the other hand, is nonparametric and, in the form used here, is ideally suited to binary data. The apparent error rates of the kernel method are found to be consistently less than those of the classical method. However, when the true error rates are estimated either by applying the classifiers to independent test sets, or by the leaving-one-out method from the design sets, no significant difference is discernible between the two types of classifier.

Journal ArticleDOI
TL;DR: In this paper, the authors considered the multiclassification (discrimination) problem with known prior probabilities and a multi-dimensional vector of observations and showed that a certain rate is optimal in the sense that no rule can do better (uniformly over the class of smooth densities) and a rule is exhibited which does that well.
Abstract: Consider the multiclassification (discrimination) problem with known prior probabilities and a multi-dimensional vector of observations Assume the underlying densities corresponding to the various classes are unknown but a training sample of size $N$ is available from each class Rates of convergence to Bayes risk are investigated under smoothness conditions on the underlying densities of the type often seen in nonparametric density estimation These rates can be drastically affected by a small change in the prior probabilities, so the error criterion used here is Bayes risk averaged (uniformly) over all prior probabilities Then it is shown that a certain rate, $N^{-r}$, is optimal in the sense that no rule can do better (uniformly over the class of smooth densities) and a rule is exhibited which does that well The optimal value of $r$ depends on the smoothness of the distributions and the dimensionality of the observations in the same way as for nonparametric density estimation with integrated square error loss

Journal ArticleDOI
TL;DR: In this article, a class of nonparametric estimators of relative risk in the two-sample case of the proportional hazards model for censored data were investigated and the asymptotic distribution of these estimators was derived using influence functions.
Abstract: We investigate a class of nonparametric estimators of relative risk in the two-sample case of the proportional hazards model for censored data The asymptotic distribution of these estimators is derived using influence functions The optimal estimator in this class has the same influence functions and the same asymptotic distribution as the maximum partial likelihood estimator of Cox (1972) The behavior of the influence functions is discussed briefly, and the last section presents two examples from the literature


Journal ArticleDOI
TL;DR: In this paper, an algorithm is presented for computing the finite popula-tion parameters and approximate probability values associated with a recently developed class of statistical inference techniques termed multi-response permutation procedures (MRPP).
Abstract: An algorithm is presented for computing the finite popula-tion parameters and the approximate probability values associ-ated with a recently-developed class of statistical inference techniques termed multi-response permutation procedures (MRPP).

Journal ArticleDOI
TL;DR: In this paper, the log rank test is extended to cover ordered categories of response, e.g. response, no change, progression, and to incorporate information on duration of response.
Abstract: SUMMARY Some current nonparametric methods related to the log rank test are extended to cover ordered categories of response, e.g. response, no change, progression, and to incorporate information on duration of response. As an example, data from a protocol for the treatment of advanced breast cancer are analysed. The powers of several tests for this special case of multistate survival data are compared by Monte Carlo simulation for both small and moderate samples from exponential and Weibull distributions.

Journal ArticleDOI
TL;DR: In this article, an improved dynamic risk and reliability model based on conditional probability distributions is developed. But the model is not suitable for nonparametric methods for evaluating the exceedance probability of a hydrologic event.
Abstract: Improved dynamic risk and reliability models based upon conditional probability distributions are developed. The new dynamic risk model is shown to reflect more accurately the overall risk (probability of failure considering both hydrologic and hydraulic uncertainties) of a hydraulic structure. This dynamic risk model is also shown to have a close correspondence to nonparametric methods for evaluating the exceedance probability of a hydrologic event.