scispace - formally typeset
Search or ask a question

Showing papers in "Technometrics in 1977"


Journal ArticleDOI
Ronald D. Snee1
TL;DR: It is concluded that data splitting is an effective method of model validation when it is not practical to collect new data to test the model.
Abstract: Methods to determine the validity of regression models include comparison of model predictions and coefficients with theory, collection of new data to check model predictions. comparison of results with theoretical model calculations, and data splitting or cross-validation in which a portion of the data is used to estimate the model coefficients, and the remainder of the data is used to measure the prediction accuracy of the model. An expository review of these methods is presented. It is concluded that data splitting is an effective method of model validation when it is not practical to collect new data to test the model. The DUPLEX algorithm, developed by R. W. Kennard, is recommended for dividing the data into the estimation set and prediction set when there is no obvious variable such as time to use as a basis to split the data. Several examples are included to illustrate the various methods of model validation.

1,165 citations


Journal ArticleDOI
TL;DR: A class of density estimates using a superposition of kernels where the kernel parameter can depend on the nearest neighbor distances is studied by the use of simulated data and their performance is superior to that of the usual Parzen estimators.
Abstract: A class of density estimates using a superposition of kernels where the kernel parameter can depend on the nearest neighbor distances is studied by the use of simulated data. Their performance using several measures of error is superior to that of the usual Parzen estimators. A tentative solution is given to the problem of calibrating the kernel peakedness when faced with a finite sample set.

490 citations


Journal ArticleDOI
TL;DR: In this article, the authors suggest the use of the inverse Gaussian distribution for a model of such lifetime behavior and discuss different reliability features of the distribution, and show that its failure rate is nonmonotonic, initially increasing and then decreasing.
Abstract: Early occurrence of certain events such as failure or repairs is a common phenomenon in the lifetime of industrial products. Often, the log normal distribution has been found as a useful model to be applicable whenever the early occurrences dominate a lifetime distribution. In this paper we suggest the use of the inverse Gaussian distribution for a model of such lifetime behavior and discuss different reliability features of the distribution. It is shown that its failure rate is nonmonotonic, initially increasing and then decreasing. Advantages in the use of the inverse Gaussian over the log normal are given. Certain numerical results are presented for illustration.

320 citations


Journal ArticleDOI
TL;DR: In this article, both the standard jackknife and a weighted jackknife are investigated in the general linear model situation and properties of bias reduction and standard error estimation are derived. And a preliminary discussion of robust regression fitting using jackknife pseudo-values is presented.
Abstract: Both the standard jackknife and a weighted jackknife are investigated in the general linear model situation. Properties of bias reduction and standard error estimation are derived. and the weighted jackknife shown to be superior for unbalanced data. There is a preliminary discussion of robust regression fitting using jackknife pseudo-values.

303 citations


Journal ArticleDOI
TL;DR: In this paper, the authors introduce the concept of conditional probability and introduce a set of variables and distributions for estimating the probability of a given set of estimators, including large random samples and special distributions.
Abstract: 1. Introduction to Probability 2. Conditional Probability 3. Random Variables and Distributions 4. Expectation 5. Special Distributions 6. Large Random Samples 7. Estimation 8. Sampling Distributions of Estimators 9. Testing Hypotheses 10. Categorical Data and Nonparametric Methods 11. Linear Statistical Models 12. Simulation

207 citations


Journal ArticleDOI
TL;DR: In this article, a study is made about a shift in the mean of a set of independent normal random variables with unknown common variance, and the marginal and joint posterior distributions of the unknown time point and the amount of shift are derived.
Abstract: In this article, a study is made about a shift in the mean of a set of independent normal random variables with unknown common variance. The marginal and joint posterior distributions of the unknown time point and the amount of shift are derived. Small and large sample results are presented.

204 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present a sampling for microbiological analysis in foods, based on Micro-Organisms in Foods 2: Sampling for Microbiological Analysis; Principles and Specific Applications.
Abstract: (1977). Micro-Organisms in Foods 2: Sampling for Microbiological Analysis; Principles and Specific Applications. Technometrics: Vol. 19, No. 2, pp. 221-221.

184 citations



Journal ArticleDOI
TL;DR: In this article, the Kolmogorov-Smirnov two-sided goodness-of-fit statistic was applied to discrete or grouped data, and the exact distribution of the statistic was tabulated for a given case and approximations discussed.
Abstract: This paper considers the Kolmogorov-Smirnov two-sided goodness-of-fit statistic when applied to discrete or grouped data, The exact distribution of the statistic is tabulated for a given case and approximations discussed. The power of the test is compared with the power of the x 2 test, and the test is shown to have greater power for particular trend alternatives.

152 citations


Journal ArticleDOI
TL;DR: In this article, exponential, Weibull, gamma and log-normal models are used in the analysis of failure time data, and likelihood methods are described for discrimination among the special cases and for assessment of each within the more comprehensive framework.
Abstract: Exponential, Weibull, gamma and log-normal models are frequently used in the analysis of failure time data. By extending the generalized gamma model of Stacy [l5]. Prentice [13] showed that the models listed are all embraced by a single parametric family. Likelihood methods were described for discrimination among the special cases and for assessment of each within the more comprehensive framework. Here, such methods are applied to several recent data sets from the industrial and medical literature in order to study distributional shape. New results are given for the accommodation of censoring and regression variables. The appropriateness of Weibull and lognormal models is emphasized.

133 citations


Journal ArticleDOI
TL;DR: Five programs for selection of variables in discriminant analysis are compared: the program DISCRIM of McCabe, the BMD07M program, see Dixon, the program ALLOC-I of Habhema, Hermana and van den Broek, and two more recent programs: SPSS and BMDP7M.
Abstract: Five programs for selection of variables in discriminant analysis are compared: the program DISCRIM of McCabe [8]. the BMD07M program, see Dixon [1], the program ALLOC-I of Habhema, Hermana and van den Broek [3], and two more recent programs: SPSS and BMDP7M, see Nie e.a. [10] and Dixon [2]. Emphasis is on the criteria for selection and on the distributional assumptions involved. The programs are compared experimentally using two examples: one with real data, also used by McCabe [8], and one with simulated data.


Journal ArticleDOI
TL;DR: In this paper, a review of the published work on the performance of Fisher's linear discriminant function when underlying assumptions are violated is given, and new results are presented for the case of classification using both binary and continuous variables.
Abstract: A review is given of the published work on the performance of Fisher's linear discriminant function when underlying assumptions are violated. Some new results are presented for the case of classification using both binary and continuous variables, and conditions for success or failure of the linear discriminant function are investigated.

Journal ArticleDOI
TL;DR: In this article, maximum likelihood estimators and estimators which utilize the first order statistic are derived for the three-parameter gamma distribution (Pearson's Type III Distribution) when samples are progressively censored.
Abstract: This paper is a continuation of previous work concerning progressively censored sampling in the normal and the exponential distribution [1], in the Weibull distribution [4], and in the log-normal distribution [5]. Here maximum likelihood estimators and estimators which utilize the first order statistic are derived for the three-parameter gamma distribution (Pearson's Type III Distribution) when samples are progressively censored. Various special cases are also considered. Illustrative examples involving life test data are included. Some of the sampling properties of the proposed estimators are investigated.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed the minimization of the sum of relative errors (MSRE) as an alternative criterion to the minimum of squared error (MSSE) and minimum of absolute error (MSAE).
Abstract: When linear regression is used for prediction purposes, the minimization of the sum of relative errors (MSRE) is proposed as an alternative criterion to the minimization of the sum of squared errors (MSSE) and the minimization of the sum of absolute errors (MSAE). The problem is formulated as a linear programming problem and a solution procedure is given. The problem of subset selection with the MSRE criterion is also considered and results illustrated with an example.

Journal ArticleDOI
D. B. Preston1

Journal ArticleDOI
TL;DR: In this article, a unified approach based on the ranks of the residuals iS was developed for testing and estimation in the linear model, the methods are robust and efficient relative to least squares methods.
Abstract: A unified approach based on the ranks of the residuals iS developed for testing and estimation in the linear model, The methods are robust and efficient relative to least squares methods. For ease of application, the strong analogy to least squares strategy is emphasized. The procedures are illustrated on examples from regression and analysis of covariance.

Journal ArticleDOI
TL;DR: In this article, a method of finding confidence bounds on Weibull reliability, or tolerance limits for the extreme-value distribution is presented, and a simplified estimator of the parameters, for complete samples, is presented.
Abstract: A method of finding confidence bounds on Weibull reliability, or tolerance limits for the Weibull or extreme-value distribution is presented. Inference procedures for the parameters are also discussed. Comparisons are made with some other available methods. New simplified estimators, of the parameters, for complete samples, are presented.

Journal ArticleDOI
TL;DR: In this article, it was shown that the usual non-randomized, conditional test for comparing proportions using independent binomial samples, is very conservative in the sense that the actual significance level attributable to an outcome is often one-fourth to one-half of the anticipated value.
Abstract: It is shown that the “usual” nonrandomized, conditional test for comparing proportions using independent binomial samples, is very conservative in the sense that the actual significance level attributable to an outcome is often one-fourth to one-half of the anticipated value. A nonrandomized unconditional test is proposed, and for sample sizes up to 15, tables are given in an appendix which specify one-sided critical regions of size less than or equal to the nominal values 0.05, and 0.01 (two-sided critical regions are also given). Numerical examples illustrating the use of the tables and a brief description of the algorithm used to generate the tables are included.

Journal ArticleDOI
Robert L. Obenchain1
TL;DR: The ASSOCIATFD PROBABILITY of a ridge estimate is defined using the usual, hyperellipsoidal confidence region centered at the least squares estimator, and it is argued that ridge estimates are of relatively little interest when they are so "extreme" that they lie outside of the least square region of say 90 percent confidence.
Abstract: For testing general linear hypotheses in multiple regression models it is shown that non-stochastically shrunken ridge estimators yield the same central F-ratios and t-statistics as does the least squares estimator Thus although ridge regression does produce biased point estimates which deviate from the least squares solution, ridge techniques do not generally yield “new” normal theory statistical inferences: in particular, ridging does not necessarily produce “shifted” confidence regions A concept, the ASSOCIATFD PROBABILITY of a ridge estimate, is defined using the usual, hyperellipsoidal confidence region centered at the least squares estimator, and it is argued that ridge estimates are of relatively little interest when they are so “extreme” that they lie outside of the least squares region of say 90 percent confidence

Journal ArticleDOI
Z. Galil1, J. Kiefer1
TL;DR: In this paper, the authors considered quadratic regression with mixtures of nonnegative components and compared the designs that are optimum with respect to the D-, A-, and E-optimality criteria in their performance relative to these and other criteria.
Abstract: Designs for quadratic regression are considered when the possible values of the controlable variable are mixtures x = (x 1, x 2, …, x q + 1) of nonnegative components x i with Σ q + 1 1 x i = 1. The designs that are optimum with respect to the D-, A-, and E-optimality criteria are compared in their performance relative to these and other criteria. Computational routines for obtaining these designs are developed, and the geometry of optimum structures is discussed. Except when q = 2, the A-optimum design is supported by the vertices and midpoints of edges of the simplex, as is the case for the previously known D-optimum design. Although the E-optimum design requires more observation points, it is more robust in its efficiency, under variation of criterion: but all three designs perform reasonably well in this sense.

Journal ArticleDOI
TL;DR: In this article, a logistic model is used to allocate test units to overstress conditions when it is desired to estimate the survival probability at a design condition with a low expected failure probability.
Abstract: This paper is concerned with the optimum allocation of test units to overstress conditions when it is desired to estimate the survival probability at a design condition with a low expected failure probability. The criterion is that of minimizing the large sample variance: a logistic model is assumed. Expressions and charts for allocating test units to the accelerated stresses are provided and procedures for determining the stresses when these are not all specified are given. The gain in efficiency from using these plans versus testing exclusively at the design condition is analyzed. The requirement for some testing at the design stress and at an intermediate stress is also considered.


Journal ArticleDOI
TL;DR: This volume presents a "systematic treatment of large contingency tables with troublesome irregularities" and results are organized by "presenting parametric models, sampling schemes, basic theory, practical examples, advice on computation."
Abstract: In view of the wide interest in applications of statistical methods for discrete multivariate data and the limited number of texts available, this book is a welcome addition to the literature. With the exception of two-way contingency tables, the literature on qualitative data is too widely scattered for the casual user to find easily. This is due partly to the recent dramatic increase in activity in the field, and partly to some special characteristics of qualitative data. Complex contingency tables admit a variety of models which may be difficult to describe simply. Distribution theory is usually approximate and techniques based on the approximate asymptotic theory lead to a variety of reasonable solutions to the same problem. Estimation is difficult in practice because algorithms with closed solutions do not always exist. As corollary, analyses can be done only with the aid of a computer. This volume presents a "systematic treatment of large contingency tables with troublesome irregularities." Results, heretofore widely scattered in the literature, are organized by "presenting parametric models, sampling schemes, basic theory, practical examples, advice on computation." The intended audience is both the theoretical and the applied statistician. The book is massive in scope and bulk. The encyclopedic quality of the work is suggested by an outline of chapters, sections, and sub-sections. It requires five-hundred lines!

Journal ArticleDOI
TL;DR: In this paper, the published mixture models are briefly recapitulated and a type of model which combines Scheffe polynomials and inverse terms is suggested, applied to an example of published data and found to be a potentially useful one.
Abstract: The published mixture models are briefly recapitulated and a type of model which combines Scheffe polynomials and inverse terms is suggested. This model is applied to an example of published data and found to be a potentially useful one.

Journal ArticleDOI
TL;DR: In this article, the problem of predicting the smallest observation in a future sample of n observations for the same distribution and the mean Y of the future sample is discussed, based on a Type II censored sample from the distribution.
Abstract: We discuss the problem of predicting, on the basis of a sample from a two parameter exponential distribution, the s'th smallest observation Ys in a future sample of n observations for the same distribution, and the mean Y of the future sample. It is shown how to obtain prediction intervals for Ys and Y, based on a Type II censored sample from the distribution.

Journal ArticleDOI
TL;DR: In this paper, a solution is given to the problem of using a variable correlated with a performance variable to screen a product when the correlation is unknown or when the proportion of presently acceptable product is unknown.
Abstract: A solution is given to the problem of using a variable correlated with a performance variable to screen a product when the correlation is unknown or when the proportion of presently acceptable product is unknown The procedure involves setting lower confidence limits on these unknown parameters and then relating the confidence coefficient to the probability of the procedure attaining a required proportion of acceptable product The performance variable and screening variable are assumed to be jointly normally distributed

Journal ArticleDOI
TL;DR: In this article, three related test procedures are developed to test the composite hypothesis of normality for complete samples, which have their origins in an attempt to formalize the appearance of nonlinearity in probability plots.
Abstract: The problem of testing the composite hypothesis of normality is considered for complete samples. Three related test procedures are developed. The testing procedures have their origins in an attempt to formalize the appearance of nonlinearity in probability plots. The fitting of the ordered observations is accomplished by general linear least squares using the expected values of the standard normal order statistics (snos) as plotting positions. The moments of the snos have been approximated where necessary. The test statistics are ratios involving the squares of linear combinations of order statistics and the usual quadratic estimate of the variance. The percentage points ofthe test statistics aregenerally intractable by analytical methods. However percentage points are estimated using simulation techniques. The test procedures are compared to nine other tests of the composite hypothesis of normality in an empirical power study.

Journal ArticleDOI
TL;DR: In this article, the authors considered a multivariate gamma distribution with the assumption that T is a (K × N) incidence matrix, and the implied restrictions on the parameter space of the desired multi-dimensional gamma distribution were discussed.
Abstract: To generate the gamma distributed random vector ? (of dimension K) the scheme η = ξ(1) + Tξ(2) is considered where ξ(1) (of dimension K) and ξ(2) (of dimension N) consist of independently gamma distributed variables and T is a (K × N) incidence matrix. For certain “patterns” of T the implied restrictions on the parameter space of the desired multivariate gamma distribution are discussed; in particular the scheme does not allow negative covariances.

Journal ArticleDOI
TL;DR: In this article, the authors considered conditional tests on the scale parameter of the gamma distribution with an unknown nuisance shape parameter, based on the conditional distribution of the sample mean x (or equivalently Wi, = x/x) given the geometric mean x, are uniformly most powerful unbiased tests.
Abstract: Conditional tests on the scale parameter of the gamma distribution with an unknown nuisance shape parameter are considered. Such tests, based upon the conditional distribution of the sample mean x (or equivalently Wi , = x/x) given the geometric mean x, are uniformly most powerful unbiased tests. Percentage points of the conditional distribution are tabulated for small sample sizes and an asymptotic normal approximation is also obtained.