scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the American Statistical Association in 1979"


Journal ArticleDOI
TL;DR: In this article, the limit distributions of the estimator of p and of the regression t test are derived under the assumption that p = ± 1, where p is a fixed constant and t is a sequence of independent normal random variables.
Abstract: Let n observations Y 1, Y 2, ···, Y n be generated by the model Y t = pY t−1 + e t , where Y 0 is a fixed constant and {e t } t-1 n is a sequence of independent normal random variables with mean 0 and variance σ2. Properties of the regression estimator of p are obtained under the assumption that p = ±1. Representations for the limit distributions of the estimator of p and of the regression t test are derived. The estimator of p and the regression t test furnish methods of testing the hypothesis that p = 1.

23,509 citations


Journal ArticleDOI
William S. Cleveland1
TL;DR: Robust locally weighted regression as discussed by the authors is a method for smoothing a scatterplot, in which the fitted value at z k is the value of a polynomial fit to the data using weighted least squares, where the weight for (x i, y i ) is large if x i is close to x k and small if it is not.
Abstract: The visual information on a scatterplot can be greatly enhanced, with little additional cost, by computing and plotting smoothed points. Robust locally weighted regression is a method for smoothing a scatterplot, (x i , y i ), i = 1, …, n, in which the fitted value at z k is the value of a polynomial fit to the data using weighted least squares, where the weight for (x i , y i ) is large if x i is close to x k and small if it is not. A robust fitting procedure is used that guards against deviant points distorting the smoothed points. Visual, computational, and statistical issues of robust locally weighted regression are discussed. Several examples, including data on lead intoxication, are used to illustrate the methodology.

10,225 citations




Journal ArticleDOI
TL;DR: In this article, an adaptation of the James-Stein estimator is applied to sample estimates of income for small places (i.e., population less than 1,000) from the 1970 Census of Population and Housing.
Abstract: An adaptation of the James-Stein estimator is applied to sample estimates of income for small places (i.e., population less than 1,000) from the 1970 Census of Population and Housing. The adaptation incorporates linear regression in the context of unequal variances. Evidence is presented that the resulting estimates have smaller average error than either the sample estimates or an alternate procedure of using county averages. The new estimates for these small places now form the basis for the Census Bureau's updated estimates of per capita income for the General Revenue Sharing Program.

1,173 citations


Journal ArticleDOI
TL;DR: In this article, a synthesis of Bayesian and sample-reuse approaches to the problem of high structure model selection geared to prediction is presented. But this approach is not suitable for high-dimensional models.
Abstract: This article offers a synthesis of Bayesian and sample-reuse approaches to the problem of high structure model selection geared to prediction. Similar methods are used for low structure models. Nested and nonnested paradigms are discussed and examples given.

940 citations


Journal ArticleDOI
TL;DR: In this paper, Monte Carlo methods are used to study the efficacy of multivariate matched sampling and regression adjustment for controlling bias due to specific matching variables X when dependent variables are moderately nonlinear in X.
Abstract: Monte Carlo methods are used to study the efficacy of multivariate matched sampling and regression adjustment for controlling bias due to specific matching variables X when dependent variables are moderately nonlinear in X. The general conclusion is that nearest available Mahalanobis metric matching in combination with regression adjustment on matched pair differences is a highly effective plan for controlling bias due to X.

886 citations


Journal ArticleDOI
TL;DR: In this paper, a class of models for the analysis of association in a contingency table with ordered rows and ordered columns is proposed, including the null association model, the uniform association model and models that describe the possible effects of the rows and columns on the association.
Abstract: A class of models is proposed for the analysis of association in a contingency table with ordered rows and ordered columns. Association is measured in terms of the odds-ratios in 2 × 2 subtables formed from adjacent rows and adjacent columns. This class includes the null association model, the uniform association model, and models that describe the possible effects of the rows and/or columns on the association. With these models, the association in the table can be analyzed in a manner analogous to the usual two-way analysis of variance, and parsimonious descriptions of this association can be obtained often. Applications are discussed here, some well-known sets of data are reanalyzed, and new insights into these data are obtained.

830 citations


Journal ArticleDOI
TL;DR: In this article, the authors investigated the characteristics of observations which cause them to be influential in least square analysis and related to residual variances, residual correlations, and the convex hull of the observed values of the independent variables.
Abstract: Characteristics of observations which cause them to be influential in a least squares analysis are investigated and related to residual variances, residual correlations, and the convex hull of the observed values of the independent variables. It is shown how deleting an observation can substantially alter an analysis by changing the partial F-tests, the studentized residuals, the residual variances, the convex hull of the independent variables, and the estimated parameter vector. Outliers are discussed briefly, and an example is presented.

745 citations


BookDOI
TL;DR: In this paper, the authors present a method to square numbers by extracting the square root from the data and computing the standard deviation with grouped data using a regression model, which is used to measure the degree of variance of the deviation.
Abstract: 1 The Nature of Statistical Methods.- General Interest in Numbers.- The Purposes of Statistical Methods.- Preview of This Text.- Rounding, Significant Figures, and Decimals.- Supplement 1.- Rounding.- 2 Averages.- Raw Data.- The Mean Computed from Raw Data.- Grouping of Data.- The Mean Computed from Grouped Data.- The Median.- Choice of Mean or Median.- The Histogram and the Frequency Polygon.- Summary.- Supplement 2.- Other Averages.- Proof that Error Due to Grouping Is Small.- Cumulative Graphs.- Bar Diagrams.- 3 The Standard Deviation.- Need for a Measure of Variability.- Formula for the Standard Deviation.- Computing the Standard Deviation with Grouped Data.- Standard Deviation of a Finite Population.- Standard Scores.- Other Measures of Dispersion.- Summary.- Supplement 3.- How to Square Numbers.- Methods of Extracting Square Roots.- How to Check Square Roots.- How to Compute and Check the Standard Deviation.- Sheppard's Correction for Coarseness of Grouping.- Some Other Measures of Dispersion.- Types of Standard Scores.- 4 Normal Probability Curve.- The Nature of the Normal Probability Curve.- The Ordinates of the Normal Probability Curve.- Binomial Coefficients and the Normal Probability Curve.- Applications of the Binomial Coefficients.- The Area under the Normal Probability Curve.- Summary.- Supplement 4.- Simplifying the Equation for the Normal Curve.- Fitting a Normal Probability Curve to Any Frequency Distribution.- 5 Statistical Inference.- Dependability of Figures.- Speculation.- Samples and Populations.- Sampling Distributions and Standard Error of the Mean.- The t Test for Means.- Levels of Significance.- The z Test for Means.- Point and Interval Estimates.- Statistical Inference.- Sampling Distribution and Standard Error of the Median.- Sampling Distribution and Standard Error of the Standard Deviation.- Hypothesis Testing.- Summary.- Supplement 5.- Stability of the Median.- Standard Error of a Proportion.- 6 Percentiles and Percentile Ranks.- Percentiles.- Percentile Ranks.- Computation of Percentiles.- Percentiles and Percentile Ranks Compared.- Deciles.- Quartiles.- Standard Error of a Percentile.- Summary.- Supplement 6.- Method of Obtaining Percentile Ranks for Grouped Data.- Measures of Variability Based upon Percentiles.- The Range.- Relative Value of Measures of Variability.- 7 Skewness and Transformed Scores.- Skewness.- Kurtosis.- Transformed Scores.- The Description of Frequency Distributions.- Summary.- Supplement 7.- Additional Measures of Skewness and Kurtosis.- 8 Pearson Product Moment Coefficient of Correlation.- Definition of Pearson r.- Plotting a Scatter Diagram.- Illustrations of Pearson r's of Various Sizes.- Published Correlation Coefficients.- Some Characteristics of Pearson r.- Computing r Without Plotting the Data.- Plotting the Data and Computing r from the Scatter Diagram.- The z' Transformation and Its Standard Error.- Assumptions upon Which Pearson r Is Based.- Interpretation of Pearson r.- Summary.- Supplement 8.- Other Formulas for Pearson r.- Alternate Ways to Test the Significance of an Obtained Pearson r.- Reliability and Validity.- 9 Regression Equations.- The Purpose of a Regression Equation.- Formulas for Regression Equations.- The Use of Regression Equations.- The Graphic Representation of Prediction.- A Second Illustration of Regression Equations.- Further Interpretations of r.- Summary.- Supplement 9.- Making a Large Number of Predictions.- 10 More Measures of Correlation.- Why Other Correlations.- Biserial r.- Multiserial Correlation.- Point Biserial r.- Classification of Dichotomous Variables.- Tetrachoric r.- Phi.- Interrelations among rbis , rpb , rt, and ?.- Summary.- Supplement 10.- Cosine Pi Correlation Coefficient.- Rank Correlation Coefficient.- 11 Chi Square.- Nature of Chi Square.- Illustration of Correct Use of Chi Square.- Sources of Error in Chi Square.- Chi Square in the General Contingency Table.- The Exact Test of Significance in 2 x 2 Tables.- Use of Chi Square in Curve Fitting.- Advantages and Disadvantages of Chi Square.- Summary.- Supplement 11.- Unique Characteristics of Chi Square.- 12 Nonparametric Statistics Other than Chi Square.- The Purposes of Nonparametric Statistics.- The Sign Test.- The Runs Test.- The Median Test.- The Mann-Whitney U Test.- Which Nonparametric Statistic Should Be Used?.- Other Nonparametric Statistics.- Summary.- 13 Simple Analysis of Variance.- Why Not the t Test.- The Basis of Analysis of Variance.- A First Example.- Assumptions.- Checking.- Technicalities.- A Second Example.- Summary.- Supplement 13.- 14 Standard Errors of Differences.- Standard Error of Any Difference.- Standard Error of the Difference between Means.- Standard Error of the Difference between Standard Deviations.- The Standard Error of Other Differences.- Summary.- Supplement 14.- 15 Reorientation.- Moments.- Correlation.- Popular Concepts.- Future Courses.- Using Statistics.

738 citations


Journal ArticleDOI
TL;DR: An approach to statistical data analysis which is simultaneously parametric and nonparametric is described, and density-quantile functions, autoregressive density estimation, estimation of location and scale parameters by regression analysis of the sample quantile function, and quantile-box plots are introduced.
Abstract: This article attempts to describe an approach to statistical data analysis which is simultaneously parametric and nonparametric. Given a random sample X 1, …, X n of a random variable X, one would like (1) to test the parametric goodness-of-fit hypothesis H 0 that the true distribution function F is of the form F(x) = F0[(x − μ)/σ)], where F 0 is specified, and (2) when H 0 is not accepted, to estimate nonparametrically the true density-quantile function fQ(u) and score function J(u) = − (fQ)'(u). The article also introduces density-quantile functions, autoregressive density estimation, estimation of location and scale parameters by regression analysis of the sample quantile function, and quantile-box plots.

Journal ArticleDOI
TL;DR: It is suggested that the procedure may be used to convert observations from one bureaucratic partitioning of a geographical area to another, using finite difference methods with classical boundary conditions.
Abstract: Census enumerations are usually packaged in irregularly shaped geographical regions. Interior values can be interpolated for such regions, without specification of “control points,” by using an analogy to elliptical partial differential equations. A solution procedure is suggested, using finite difference methods with classical boundary conditions. In order to estimate densities, an additional nonnegativity condition is required. Smooth contour maps, which satisfy the volume preserving and nonnegativity constraints, illustrate the method using actual geographical data. It is suggested that the procedure may be used to convert observations from one bureaucratic partitioning of a geographical area to another.

Journal ArticleDOI
TL;DR: In this paper, a method is given for comparing principal component analyses conducted on the same variables in two different groups of individuals, and an extension to the case of more than two groups is outlined.
Abstract: A method is given for comparing principal component analyses conducted on the same variables in two different groups of individuals, and an extension to the case of more than two groups is outlined. The technique leads to a latent root and vector problem, which has also arisen in the comparison of factor patterns in separate factor analyses. Emphasis in the present article is on the underlying geometry and interpretation of the results. An illustrative example is provided.

Journal ArticleDOI
TL;DR: In this paper, the authors found that correlations between price changes in the same stock and in different stocks in successive periods decrease with the length of the interval for which the price changes are measured.
Abstract: Correlations among price changes in common stocks of companies in one industry are found to decrease with the length of the interval for which the price changes are measured. This phenomenon seems to be caused by nonstationarity of security price changes and by the existence of correlations between price changes in the same stock—and in different stocks—in successive periods. Although such correlations are not necessarily inconsistent with market efficiency, the data do reveal the presence of lags of an hour or more in the adjustment of stock prices to information relevant to the industry.


Journal ArticleDOI
TL;DR: In this article, a sequential procedure based on Akaike's final prediction error criterion and Granger's concept of causality to fit multiple auto-regressions is suggested, which allows each variable to enter the equation with a different time lag and provides a reasonably powerful test of exogeneity or causality.
Abstract: A sequential procedure based on Akaike's final prediction-error criterion and Granger's concept of causality to fit multiple auto-regressions is suggested. The method not only allows each variable to enter the equation with a different time lag but also provides a reasonably powerful test of exogeneity or causality. The idea is applied to Canadian postwar money and income data. It is found that a bivariate feedback model for M1 and GNP and a one-way causal relation from GNP to M2 fit the data best. Diagnostic checks applied to our model seem to indicate the adequacy of our approach.



Journal ArticleDOI
TL;DR: A class of generalized M-estimates is proposed which has attractive mean-squared-error robustness properties towards both IO and AO type deviations from the Gaussian model.
Abstract: Outliers in time series can adversely affect both the least squares estimates and ordinary M-estimates of autoregressive parameters. Attention is focused here on obtaining robust estimates of the parameter for a first-order autoregressive time series xk The observations are y k = z k + v k, and two models are considered: Model IO, with v k ≡ 0, x k possibly non-Gaussian, and Model AO, with v k nonzero and possibly quite large a small fraction of the time, and x k Gaussian. A class of generalized M-estimates is proposed which has attractive mean-squared-error robustness properties towards both IO and AO type deviations from the Gaussian model.

Journal ArticleDOI
TL;DR: In this article, a general family of weighted-rankings test statistics for comparing two or more treatments is presented, which are simple to compute, are strictly distribution free, and have asymptotic chi-squared distributions.
Abstract: The standard nonparametric procedures for testing the hypothesis of no treatment effects in a complete blocks experiment depend entirely on the within-block rankings. If block effects are assumed additive, however, then between-block information may be recovered by weighting these rankings according to their credibility with respect to treatment ordering. (For the special case of only two treatments, the sign test exemplifies use of unweighted rankings and the signed-rank test weighted.) A general family of weighted-rankings test statistics for comparing two or more treatments is presented. They are simple to compute, are strictly distribution free, and have asymptotic chi-squared distributions.

Journal ArticleDOI
TL;DR: In this article, the authors propose a method to find the book that you love to read first or find an interesting book that will make you want to read, but not a book.
Abstract: What do you do to start reading dynamic programming and stochastic control? Searching the book that you love to read first or find an interesting book that will make you want to read? Everybody has difference with their reason of reading a book. Actuary, reading habit must be from earlier. Many people may be love to read, but not a book. It's not fault. Someone will be bored to open the thick book with small words to read. In more, this is the real condition. So do happen probably with this dynamic programming and stochastic control.

Journal ArticleDOI
TL;DR: In this paper, the authors developed approximate procedures to estimate stationary mixed autoregressive moving average (ARMA) models, and properties of the estimates and an example are given, where Gaussian errors are assumed.
Abstract: Procedures to estimate parameters in multivariate autoregressive moving average (ARMA) models are developed. Gaussian errors are assumed. Exact maximum likelihood estimation procedures are developed for pure moving average models. Approximate procedures are obtained to estimate stationary mixed ARMA models. Properties of the estimates and an example are given.

Journal ArticleDOI
TL;DR: In this article, a Monte Carlo sampling study is carried out for pair-wise differences as well as a few selected contrasts and the procedures are compared based on the results of this study.
Abstract: Nine procedures for multiple comparisons of means with unequal variances are reviewed. Modifications in some procedures are proposed either for improvement in their performance or easier implementation. A Monte Carlo sampling study is carried out for pair-wise differences as well as a few selected contrasts and the procedures are compared based on the results of this study. Recommendations for the choice of the procedures are given. Robustness of two procedures designed for homogeneous variances under violation of that assumption is also examined in the Monte Carlo study.

Journal ArticleDOI
TL;DR: The null distributions of likelihood ratio test statistics are given by Hawkins as discussed by the authors for the two cases of known and unknown σ2, and a numerical integration technique is used to obtain standard percentage points for n = 3(1)10.
Abstract: An alternative to the hypothesis that the sequence X 1 ···, X n are independent and identically distributed normal random variables, with mean μ and variance σ2, is that the location parameter μ shifts at some unknown instant. The null distributions of likelihood ratio test statistics are given by Hawkins (1977) for the two cases of known and unknown σ2. Unfortunately, the null distribution for unknown σ2 obtained in that article is incorrect. In this article the correct null distribution is found and a numerical integration technique is used to obtain standard percentage points for n = 3(1)10. A Monte Carlo method is used to obtain additional standard percentage points for n = 15(5)50.

Journal ArticleDOI
TL;DR: In this paper, a new robust homogeneity of variance test for fixed effects designs is proposed, based on a distributional theory and a robust homogeneous variance test, which allows for analysis of variance (ANOVA) tests of additive models.
Abstract: Linearly combining Levene's z 2 variable with the jackknife pseudo-values of s 2 produces a family of variables that allows for analysis of variance (ANOVA) tests of additive models for the variances in fixed effects designs. Some distributional theory is developed, and a new robust homogeneity of variance test is advocated.

Journal ArticleDOI
Bengt Muthén1
TL;DR: In this paper, a model with dichotomous indicators of latent variables is developed, where latent variables are related to each other and to a set of exogenous variables in a system of structural relations.
Abstract: A model with dichotomous indicators of latent variables is developed. The latent variables are related to each other and to a set of exogenous variables in a system of structural relations. Identification and maximum likelihood estimation of the model are treated. A sociological application is presented in which a theoretical construct (an attitude) is related to a set of background variables. The construct is not measured directly, but is indicated by the answers to a pair of questionnaire statements.

Journal ArticleDOI
TL;DR: In this paper, conditional and estimated unconditional probabilities of correct classification are employed to compare alternative stopping rules that can be used with the forward stepwise selection method in the two-group multivariate normal classification problem.
Abstract: Criteria based on conditional and estimated unconditional probabilities of correct classification are employed to compare alternative stopping rules that can be used with the forward stepwise selection method in the two-group multivariate normal classification problem. Based on Monte Carlo studies of 48 sampling situations, it is found that F to enter (.10 ≤ α ≤ .25) and a rule based on the maximum estimated unconditional probability often perform better than the strict use of all variables. Although the relative gains in classification are not large, the reductions in the numbers of variables to be used may be substantial.

Journal ArticleDOI
TL;DR: In this paper, a randomization analysis of a completely randomized growth (or response) curve model is derived from the basic assumptions of the design in the manner of Kempthorne (1955).
Abstract: A randomization analysis of a completely randomized growth (or response) curve model is derived from the basic assumptions of the design in the manner of Kempthorne (1955). Best linear unbiased estimators for mean growth and contrast curves are obtained. Kempthorne's discussion of a univariate randomization test is generalized to the case of unequal group sizes and adapted to the problem of testing for treatment effects at a point in time. A similar randomization test for treatment effects over a specified interval of time is motivated and approximated by a standard F test. The proposed analysis is applied to experimental data.

Journal ArticleDOI
TL;DR: It is concluded that better solutions to these problems, better data, more sophisticated use of economic theory, application of more rigorous diagnostic checks, and use of well-designed simulation experiments probably will produce improved macroeconometric models.
Abstract: In this article, a summary of some research bearing on the statistical analysis of econometric models is reviewed. Many estimation, testing, and prediction techniques used in econometrics have just large-sample justifications. Selected Bayesian inference results relating to econometric models are reviewed. On the problem of constructing econometric models, an approach that is a blend of traditional econometric and modern time series analysis techniques is described. Many statistical problems requiring further analysis are noted. It is concluded that better solutions to these problems, better data, more sophisticated use of economic theory, application of more rigorous diagnostic checks, including forecasting checks and use of well-designed simulation experiments, probably will produce improved macroeconometric models.

Journal ArticleDOI
TL;DR: A technique for detecting shift of scale parameter in a sequence of independent gamma random variables is discussed in this paper, where the distribution theories and related properties of the test statistic are investigated.
Abstract: In this article a technique for detecting shift of scale parameter in a sequence of independent gamma random variables is discussed. Distribution theories and related properties of the test statistic are investigated. Numerical critical points and test powers are tabulated for two specific variables. Other useful techniques are also summarized. The methods are then applied to the analysis of stock-market returns and air traffic flows. These two examples are studied in detail to illustrate the use of the proposed method compared to other available techniques. The empirical examples also illuminate the importance of the treatment of stochastic instability in statistical applications.