scispace - formally typeset
Search or ask a question

Showing papers in "Technometrics in 1974"


Journal ArticleDOI
TL;DR: It is shown that data augmentation provides a rather general formulation for the study of biased prediction techniques using multiple linear regression and a way to obtain predictors given a credible criterion of good prediction is proposed.
Abstract: We show that data augmentation provides a rather general formulation for the study of biased prediction techniques using multiple linear regression. Variable selection is a limiting case, and Ridge regression is a special case of data augmentation. We propose a way to obtain predictors given a credible criterion of good prediction.

1,338 citations


Journal ArticleDOI
TL;DR: The idea of fitting polynomials to equally-spaced data was introduced in this article, in which the equalspacing is theoretically precise and the data is accurate to many decimal places, and a hard look at such examples forces us to reexamine our thinking on such diverse issues as: How to formulate such problems, the use of robust/resistant techniques in polynomial regression, which coordinates to use and why, choices in stopping a fit, and improved ways to describe our answers.
Abstract: The prototype of fitting polynomials to equally-spaced data—in which the equalspacing is theoretically precise and the data is accurate to many decimal places—arises in the analysis of band spectra. A hard look at such examples forces us to reexamine our thinking on such diverse issues as: How to formulate such problems, the use of robust/resistant techniques in polynomial regression, which coordinates to use and why, the basic properties of linear least squares, choices in stopping a fit, and improved ways to describe our answers. Our results and attitudes apply rather directly to other situations where we are fitting a sum of functions of a single variable. When two or more different variables, subject to error, blunder, or omission, underlie the carriers to be considered, regression/fitting problems are likely to need not only the considerations presented here, but others as well. To a varying extent, the same will be true of nonlinear fitting/regression problems.

915 citations


Journal ArticleDOI

717 citations


Journal ArticleDOI
TL;DR: In this article, the authors compared four statistics which may be used to test the equality of population means with respect to their robustness under heteroscedasticity, their power, and the overlap of their critical regions.
Abstract: Four statistics which may be used to test the equality of population means are com-pared with respect to their robustness under heteroscedasticity, their power, and the overlap of their critical regions. The four are: the ANOVA F-statistic; a modified F which has the same numerator as the ANOVA but an altered denominator; and two similar statistics proposed by Welch and James which differ primarily in their approximations for their critical values. The critical values proposed by Welch are a better approximation for small sample sizes than that proposed by James. Both Welch's statistic and the modified F are robust under the inequality of variances. The choice between them depends upon the magnitude of the means and their standard errors. When the population variances are equal, the critical region of the modified F more closely approximates that of the ANOVA than does Welch's.

600 citations


Journal ArticleDOI
David F. Andrews1
TL;DR: In this paper, the authors show that techniques of fitting are robust of efficiency when their statistical efficiency remains high for conditions more realistic than the utopian cases of Gaussian distributions with errors of equal variance.
Abstract: Techniques of fitting are said to be resistant when the result is not greatly altered in the case a small fraction of the data is altered: techniques of fitting are said to be robust of efficiency when their statistical efficiency remains high for conditions more realistic than the utopian cases of Gaussian distributions with errors of equal variance. These properties are particularly important in the formative stages of model building when the form of the response is not known exactly. Techniques with these properties are proposed and discussed.

525 citations


Journal ArticleDOI
Svante Wold1
TL;DR: The use of spline functions in the analysis of empirical two-dimensional data (y i, x i) is described in this paper, where the authors define spline function as piecewise polynomials with continuity conditions, which give them unique properties as empirical function.
Abstract: The use of spline functions in the analysis of empirical two-dimensional data (y i, x i) is described. The definition of spline functions as piecewise polynomials with continuity conditions give them unique properties as empirical function. They can represent any variation of y with x arbitrarily well over wide intervals of x. Furthermore, due to the local properties of the spline functions, they are excellent tools for differentiation and integration of empirical data. Hence, spline functions are excellent empirical functions which can be used with advantage instead of other empirical functions, such as poly-nomials or exponentials. Examples of application show spline analyses of response curves in pharmacokinetics and of the local behavior of almost first order kinetic data.

352 citations


Journal ArticleDOI
TL;DR: In this article, the authors employ the theory of weak convergence of cumulative sums to the Wiener Process to obtain large sample theory for cusum tests and study the effect of serial correlation on the performance of the one-sided cusUM test.
Abstract: We employ the theory of weak convergence of cumulative sums to the Wiener Process to obtain large sample theory for cusum tests. These results provide at theoretical basis for studying the effects of serial correlation on the performance of the one-sided cusum test proposed by Page (1955). Particular attention is placed on the first, order auto-regressive and first order moving average models. In order to treat the sequential version of the test, we employ the same Wiener process approximation. This enables us to study the effect of correlation not only on the average run length but, more importantly, on the run length distribution itself. These theoretical distributions are shown to compare quite favorably with the true distribution on the basis of a Monte Carlo study using normal observations. The results on the changes in the shape of the run length distributions show that more than average run length should be considered. Our primary conclusion is that the cusum test is not robust with respect, to dep...

283 citations


Journal ArticleDOI
TL;DR: The Dichotomous Data Problem as discussed by the authors, the Two-Sample Dispersion Problem, and Other Two--Sample Problems are the most commonly used problems in survival analysis. But they do not consider the two-way layout.
Abstract: The Dichotomous Data Problem. The One--Sample Location Problem. The Two--Sample Location Problem. The Two--Sample Dispersion Problem and Other Two--Sample Problems. The One--Way Layout. The Two--Way Layout. The Independence Problem. Regression Problems. Comparing Two Success Probabilities. Life Distributions and Survival Analysis. Appendix. Bibliography. Answers to Selected Problems. Indexes.

210 citations


Journal ArticleDOI
TL;DR: In this article, the concept of increasing "conditional mean exceedance" provides a reasonable way of describing the heavy-tail phenomenon, and a family of Pareto distributions is shown to represent distributions for which this parameter is linearly increasing.
Abstract: Distributions with heavier-than-exponential tails are studied for describing empirical phenomena. It is argued that the concept of increasing “conditional mean exceedance” provides a reasonable way of describing the heavy-tail phenomenon, and a family of Pareto distributions is shown to represent distributions for which this parameter is linearly increasing. A test is developed and modified so as to be suitable for testing heavy-tailedness, and some graphical procedures are also suggested.

198 citations


Journal ArticleDOI
TL;DR: In this paper, a method for using power transformation weights in least squares analysis to account for inhomogeneity of variance is presented, where the need for this form of weighting is common when, as is illustrated with an example, a linearized kinetic rate expression is analyzed.
Abstract: A method is presented for using power transformation weights in least squares analysis to account for inhomogeneity of variance. The need for this form of weighting is common. In particular, it frequently arises when, as is illustrated with an example, a linearized kinetic rate expression is analyzed.

179 citations


Journal ArticleDOI
TL;DR: In this article, a modified least square estimation procedure is introduced to determine whether the near singularity has predictive value and examine alternate prediction equations in which the effect of the near singrtlarity has been removed from the estimates of the regression coefficients.
Abstract: Least squares estimates of parameters of a multiple linear regression model are known to be highly variable when the matrix of independent variables is near singular. Using the latent roots and latent vectors of the “correlation matrix” of the dependent and independent variables a modified least squares estimation procedure is introduced. This technique enables one to determine whether the near singularity has predictive value and examine alternate prediction equations in which the effect of the near singrtlarity has been removed from the estimates of the regression coefficients. In addition a method for performing backward elimination of variables using standard least squares or the modified procedure is presented.

Journal ArticleDOI
TL;DR: In this article, the maximum likelihood estimators of the parameters and reliability are studied for both complete and censored sampling, and the asymptotic variance covariance matrix is derived.
Abstract: Some general comments are made concerning life-testing distributions with polynomial hazard functions, and some least squares type estimators are suggested as a possible method of parameter estimation. The linear hazard function case (h(t) = α + bt ) is considered in some detail. The maximum likelihood estimators of the parameters and reliability are studied for both complete and censored sampling, and the asymptotic variance covariance matrix is derived. In the linear case the simple least squares type estimators were compared to the maximum likelihood estimators by Monte Carlo simulation, and they were found to be fairly comparable to the maximllm likelihood estimators, being somewhat, better for small b/α2 and poorer for large b/α2. Percentage points were also determined by Monte Carlo simulation to mske possible tests of hypotheses for the parameters.

Journal ArticleDOI
TL;DR: In this paper, a method for resolving additive mixtures of overlapping curves by combining nonlinear regression and principal component analysis is presented, which makes use of the postulated chemical reaction, and allows one to check the reaction and estimate chemical rate and equilibrium constants.
Abstract: The paper presents a method for resolving additive mixtures of overlapping curves by combining nonlinear regression and principal component analysis. The method can be applied to spectroscopy, chromatography, etc. The method makes use of the postulated chemical reaction, and allows one to check the reaction and estimate chemical rate and equilibrium constants.

Journal ArticleDOI
TL;DR: In this paper, the identification, estimation and diagnostic checking of closed-loop systems is discussed and illustrated on two real sets of data, i.e., data generated by a process industry.
Abstract: In the process industries data must often be obtained under conditions of closedloop operation; that is, under conditions where feedback control is being applied. In the analysis of such data care is needed to properly take account of the manner of its generation. In particular, if standard open-looped procedures of model identification, estimation and diagnostic checking are applied to closed-loop data incorrect models niay result and lack of fit not be detected. This paper discusses the identification, estimation and diagnostic checking of closedloop systems and illustrates the ideas on two real sets of data.

Journal ArticleDOI
TL;DR: In this paper, the minimum variance unbiased estimation of Pr (Y < X) under the assumption that X and Y are independently negative exponentially distributed is derived in closed form, and results assume simpler forms than those in the normal case obtained by Church and Harris (1970) and Downton (1973).
Abstract: The minimum variance unbiased estimation of Pr (Y < X) under the assumption that X and Y are independently negative exponentially distributed is derived in closed form. Results assume simpler forms than those in the normal case obtained by Church and Harris (1970) and Downton (1973).

Journal ArticleDOI
Toby J. Mitchell1
TL;DR: The results of a study in which the computer algorithm DETMAX was used for the purpose of constructing n-run “D-optimal” designs over a cubic region of interest for the first-order model E(y) = β0 + β0 x 1 + … + β p x p are presented.
Abstract: This paper presents the results of a study in which the computer algorithm DETMAX was used for the purpose of constructing n-run “D-optimal” designs over a cubic region of interest for the first-order model E(y) = β0 + β0 x 1 + … + β p x p . These results suggest some general “rules” (actually conjectures) for the construction of such designs. For p ≤ 9, all but 12 combinations of n and p are covered by these “rules;” the 12 exceptions are discussed separately. The resolution IV designs obtained by folding over these “D-optimal” first-order designs are also discussed and are shown to compare favorably with designs previously published.

Journal ArticleDOI
TL;DR: In this article, an algorithm was proposed for selecting a subset of extreme vertices when the number of candidate vertices is large, and the algorithm was found to produce designs which generally have small trace (X'X)−1, indicating the average variance of the estimated coefficients in the linear model will be small.
Abstract: Extreme vertices designs are useful in experimentation with mixtures, particularly when the response can be described by a linear model. An algorithm is proposed for selecting a subset of extreme vertices when the number of candidate vertices is large. This algorithm has been found to produce designs which generally have small trace (X'X)−1, indicating the average variance of the estimated coefficients in the linear model will be small.

Journal ArticleDOI
TL;DR: In this article, a Bayesian approach to life testing situations and the concept of the Bayesian predictive distribution is used. But the approach is restricted to two-parameter exponential distributions.
Abstract: Prediction intervals for future observations in life testing situations have been derived by Hewitt (1968), Nelson (1968), Lawless (1971, 1972). Expected-cover tolerance regions have been obtained. Here we give a Bayesian approach to such situations and use the concept of the Bayesian predictive distribution. Both the exponential and the two-parameter exponential distributions are considered.

Journal ArticleDOI
TL;DR: In this paper, four types of composite designs are optimized using the |X′X| criterion: 1. A symmetric composite design (a composite design with star point distance equal to ±α), 2. An asymmetric smallest composite design, 3. A saturated composite design and 4.
Abstract: Four types of composite designs are optimized using the |X′X| criterion. These designs are: 1. A “symmetric” composite design (a composite design with “star point” distance equal to ±α) 2. A “symmetric” smallest composite design (a saturated composite design) 3. An asymmetric composite design (an asymmetric composite design has star point distance equal to (+l, −α)) 4. An asymmetric smallest composite design The comparison of these designs with other optimum designs is discussed. It is shown that symmetric composite designs are nearly optimum for experiments on a hypercube.

Journal ArticleDOI
TL;DR: In this paper, simple methods for controlling the family error rate for simultaneous tests for the choice of subsets of predictor variables in multiple regression are described for fixed and random predictors, and in the latter case a class of adequate regression equations is obtained, characterized by a lower bound on the sample multiple correlation coefficient.
Abstract: Simple methods are described for controlling the family error rate for simultaneous tests for the choice of subsets of predictor variables in multiple regression. The cases of fixed and random predictors are considered, and in the latter case a class of “adequate” regression equations is obtained, characterized by a lower bound on the sample multiple correlation coefficient.

Journal ArticleDOI
TL;DR: A parametric family of models is introduced, and estimation of the parameters is discussed; the theory of time series analysis provides useful tools for discussing such a model.
Abstract: The problem of obtaining the derivative of a set of data arises naturally in many fields. The usual methods for obtaining derivatives are based on abstract formulations of the problem, which do not take errors of observation explicitly into account. For this reason, their performarice when applied to observational data is unpredictable. By introducing random errors into the model, one may derive methods whose performance may be stated in statistical terms. The theory of time series analysis provides useful tools for discussing such a model. A parametric family of models is introduced, and estimation of the parameters is discussed.

Journal ArticleDOI
TL;DR: In this article, the authors presented some new economical, nonorthogonal, second-order designs based on irregular fractions of partially balanced type of the 3 n factorial for any number of factors n ≥ 3.
Abstract: There are presented some new economical, nonorthogonal, second-order designs based on irregular fractions of partially balanced type of the 3 n factorial for any number of factors n ≥ 3. The designs generalize those of Rechtschaffner [10]. The complete estimation, variance-covariance, trace, and determinant structures have been derived as explicit functions of n and these quantities compared for the best saturated versus the best augmented design. The efficiencies of the parameter estimates relative to those in other designs of comparable size have been computed.

Journal ArticleDOI
TL;DR: In this paper, the authors discuss some properties of a bivariate Weibull distribution and estimate, by the method of maximum likelihood, the unknown parameters of life distributions belonging to two particular parametric families when the causes of failure are dependent.
Abstract: In a life testing situation the failure of an individual, either a living organism or an inanimate object, may be classified into one of k(< 1) mutually exclusive classes, usually causes of failure. One often has dependent causes of failure in actual physical situations, i.e., the theoretical lifetime of an individual failing from one cause may be correlated with the theoretical lifetime of the same individual failing from a different cause. This paper i) discusses some properties of a bivariate Weibull distribution and ii) is concerned with estimating, by the method of maximum likelihood, the unknown parameters of life distributions belonging to two particular parametric families, viz., bivariate normal and bivariate Weibull, when the causes of failure are dependent. An example involving the failure of small electrical appliances is analyzed and compared with an analysis which assumes the causes of failure to be independent.

Journal ArticleDOI
TL;DR: In this article, the authors show that the common practice produces misleading reslllts for mixtures, and that the correct mixture statistics correspond to a physically consistent null hypothesis and are also consistent with the expression of the mixture model in the older “slack-variable” form.
Abstract: Regression models of the forms proposed by Scheffe and by Becker have been widely and usefully applied to describe the response surfaces of mixture systems. These models do not contain a constant term. It has been common practice to test the statistical significance of these mixture models by the same statistical procedures used for other regression models whose constant term is absent (e.g., because the regression must pass through the origin). In this paper we show that the common practice produces misleading reslllts for mixtures. The mixture models require a different set of F, R 2, and R A 2 statistics. The correct mixture statistics correspond to a physically consistent null hypothesis and are also consistent with the expression of the mixture model in the older “slack-variable” form. An illustrative example is included.

Journal ArticleDOI
TL;DR: In this paper, the average expected cost per unit time over an infinite time span is obtained in the case where the cost structure involves a term which takes into account adjustment costs, depreciation costs or interest charges which are suffered at fixed intervals of time of equal length.
Abstract: The age replacement policy which minimizes the average expected cost per unit time over an infinite time span is obtained in the case where the cost structure involves a term which takes into account adjustment costs, depreciation costs or interest charges which are suffered at fixed intervals of time of equal length. The optimal policy is shown to be nonrandom and sufficient conditions are given for it to be finite. Stopping rules in the search of this finite solution are also given. Finally some examples illustrate the procedure of finding these optimal policies.

Journal ArticleDOI
TL;DR: In this paper, the authors discuss k-factor, second order designs with minimum number of points ½(k + l)(k + 2), in particular, those which are extensions of designs that give minimum generalized variance for k = 2 and 3, and discuss some difficulties of using, in practice, designs that, are D-optimal.
Abstract: In this note, we discuss k-factor, second order designs with minimum number of points ½(k + l)(k + 2), in particular, those which are extensions of designs that give minimum generalized variance for k = 2 and 3. The experimental region is the unit cuboid. Minimum point designs of this type are unknown for k ≥ 4, and these designs are the best found to date except for k = 4, where a better design is known. Kiefer has shown that these designs cannot be the best for k ≥ 7, via an existence result but, even here, specific better designs are not known and appear difficult to obtain. We also discuss some difficulties of using, in practice, designs that, are D-optimal (that is give minimum generalized variance when the number of points is not restricted).

Journal ArticleDOI
TL;DR: Sampling studies show that the actual error rates of the rules from samples with initial misclassification are only slightly affected; the apparent error rates, obtained by resubstituting the observations into the calculated discriminant function, are drastically affected, and cannot be used.
Abstract: Two models of non-random initial misclassifications are studied. In these models, observations which are closer to the mean of the “wrong” population have a greater chance of being misclassified than others. Sampling studies show that (a) the actual error rates of the rules from samples with initial misclassification are only slightly affected; (b) the apparent error rates, obtained by resubstituting the observations into the calculated discriminant function, are drastically affected, and cannot be used; and (c) the Mahalanobis D 2 is greatly inflated.

Journal ArticleDOI
TL;DR: The analysis-of-covariance model and design techniques were applied to an example taken from the chemical industry and produced highly efficient allocations.
Abstract: In many of the experimental situations to which the analysis-of-covariance model applies, the values of the covariates are known prior to the actual experiment so that they can be used in allocating the experimental units to the treatments. Due to the large number of possible allocations, the computation of an allocation that is D-optimal for inferences on the treatment means will generally not be practical. Good allocations can be constructed by a multistage procedure that allocates one or more units at each stage. These allocations can be made even better by applying an iterative algorithm that induces small changes in the design at each iteration. The techniques were applied to an example taken from the chemical industry and produced highly efficient allocations. Similar design techniques can be used in experimental situations where the experimental units become available and must be allocated in stages.

Journal ArticleDOI
TL;DR: In this paper, the familiar hyperbola model is used to describe data which appear to follow two different straight line relationships on opposite sides of an undetermined join point, and the two models are fitted to two sets of experimental data for purposes of comparison.
Abstract: In a recent paper [1] a general form of transition model was suggested to describe data which appear to follow two different straight line relationships on opposite sides of an undetermined join point. An alternative model is now considered, the familiar hyperbola, parameterized in a geometrically meaningful form. The two models are fitted to two sets of experimental data for purposes of comparison. In one of the examples account is taken of autocorrelated errors using a procedure suggested by Sredni [13].

Journal ArticleDOI
A. W. Dickinson1
TL;DR: This paper describes a study which utilized a computer program written to examine run orders requiring a minimum number of factor level changes and select those in which the correlations between the main effects and a linear time trend are small.
Abstract: It is often desirable to execute the observations in an experimental design in such an order that the number of factor level changes is kept small. This paper describes a study which utilized a computer program written to examine run orders requiring a minimum number of factor level changes and select those in which the correlations between the main effects and a linear time trend are small. Some results are given for the 24 and 25 designs.