scispace - formally typeset
Search or ask a question
Author

M. B. Wilk

Bio: M. B. Wilk is an academic researcher from Bell Labs. The author has contributed to research in topics: Generalized integer gamma distribution & Gamma distribution. The author has an hindex of 5, co-authored 6 publications receiving 16004 citations.

Papers
More filters
Journal ArticleDOI
S. S. Shapiro1, M. B. Wilk1
TL;DR: In this article, a new statistical procedure for testing a complete sample for normality is introduced, which is obtained by dividing the square of an appropriate linear combination of the sample order statistics by the usual symmetric estimate of variance.
Abstract: The main intent of this paper is to introduce a new statistical procedure for testing a complete sample for normality. The test statistic is obtained by dividing the square of an appropriate linear combination of the sample order statistics by the usual symmetric estimate of variance. This ratio is both scale and origin invariant and hence the statistic is appropriate for a test of the composite hypothesis of normality. Testing for distributional assumptions in general and for normality in particular has been a major area of continuing statistical research-both theoretically and practically. A possible cause of such sustained interest is that many statistical procedures have been derived based on particular distributional assumptions-especially that of normality. Although in many cases the techniques are more robust than the assumptions underlying them, still a knowledge that the underlying assumption is incorrect may temper the use and application of the methods. Moreover, the study of a body of data with the stimulus of a distributional test may encourage consideration of, for example, normalizing transformations and the use of alternate methods such as distribution-free techniques, as well as detection of gross peculiarities such as outliers or errors. The test procedure developed in this paper is defined and some of its analytical properties described in ? 2. Operational information and tables useful in employing the test are detailed in ? 3 (which may be read independently of the rest of the paper). Some examples are given in ? 4. Section 5 consists of an extract from an empirical sampling study of the comparison of the effectiveness of various alternative tests. Discussion and concluding remarks are given in ?6. 2. THE W TEST FOR NORMALITY (COMPLETE SAMPLES) 2 1. Motivation and early work This study was initiated, in part, in an attempt to summarize formally certain indications of probability plots. In particular, could one condense departures from statistical linearity of probability plots into one or a few 'degrees of freedom' in the manner of the application of analysis of variance in regression analysis? In a probability plot, one can consider the regression of the ordered observations on the expected values of the order statistics from a standardized version of the hypothesized distribution-the plot tending to be linear if the hypothesis is true. Hence a possible method of testing the distributional assumptionis by means of an analysis of variance type procedure. Using generalized least squares (the ordered variates are correlated) linear and higher-order

16,906 citations

Journal ArticleDOI
M. B. Wilk1, R. Gnanadesikan1
TL;DR: This paper describes and discusses graphical techniques, based on the primitive empirical cumulative distribution function and on quantile (Q-Q) plots, percent (P-P) plots and hybrids of these, which are useful in assessing a one-dimensional sample, either from original data or resulting from analysis.
Abstract: SUMMARY This paper describes and discusses graphical techniques, based on the primitive empirical cumulative distribution function and on quantile (Q-Q) plots, percent (P-P) plots and hybrids of these, which are useful in assessing a one-dimensional sample, either from original data or resulting from analysis. Areas of application include: the comparison of samples; the comparison of distributions; the presentation of results on sensitivities of statistical methods; the analysis of collections of contrasts and of collections of sample variances; the assessment of multivariate contrasts;_ and the structuring of analysis of variance mean squares. Many of the objectives and techniques are illustrated by examples. This paper reviews a variety of old and new statistical techniques based on the cumulative distribution function and its ramifications. Included in the coverage are applications, for various situations and purposes, of quantile probability plots (Q-Q plots), percentage probability plots (P-P plots) and extensions and hybrids of these. The general viewpoint is that of analysis of data by statistical methods that are suggestive and constructive rather than formal procedures to be applied in the light of a tightly specified mathematical model. The technological background is taken to be current capacities in data collection and highspeed computing systems, including graphical display facilities. It is very often useful in statistical data analysis to examine and to present a body of data as though it may have originated as a one-dimensional sample, i.e. data which one wishes to treat for purposes of analysis, as an unstructured array. Sometimes this is applicable to ' original' data; even more often such a viewpoint is useful with 'derived' data, e.g. residuals from a model fitted to the data. The empirical cumulative distribution function and probability plotting methods have a key role in the statistical treatment of one-dimensional samples, being of relevance for summarization and palatable description as well as for exposure and inference.

1,301 citations

Journal ArticleDOI
TL;DR: In this paper, the authors considered the problem of estimating the scale and shape parameters of a gamma distribution, whose origin parameter is known (say equal to zero), based on order statistics.
Abstract: The gamma distribution, of which the chi-squared and exponential distributions are particular cases, is used as a model in various statistical applications. Gupta (1960) has reviewed some of its applications to life tests, extreme values, reliability, and maintenance. Gupta & Groll (1961) have described the use of the gamma distribution in acceptance sampling based on life tests. It has also been used as an approximation to the distribution of quadratic forms in certain cases (see, for example, Box (1954) and references therein). A probability plotting procedure for the gamma distribution has been presented by Wilk, Gnanadesikan & Huyett (1962), and applications to life-test data analysis and to the statistical assessment of homogeneity of variance have been described. A method has been proposed by Wilk & Gnanadesikan (1961) for the graphical analysis of multi-response data which is based on the gamma distribution and involves the estimation of the shape parameter from order statistics. The present paper is concerned with the maximum-likelihood estimation of the scale and shape parameters of a gamma distribution, whose origin parameter is known (say equal to zero), based on order statistics. Tables are provided to facilitate obtaining these estimates and their use is illustrated and discussed. The case of unknown origin parameter is also briefly dealt with. Greenwood & Durand (1960) have considered the problem of maximum-likelihood estimation for the gamma distribution based on a complete sample and have presented tables for that case. Chapman (1956) considered the problem of maximum-likelihood estimation of the parameters of a truncated gamma distribution using a complete sample. The tables of the present paper give those of Greenwood & ]Durand (1960) as a special case and a particularly simple interpolation procedure is given for the circumstance of a complete sample. Two estimation situations involving order statistics may be distinguished: namely, when the size of the complete sample is known, and when the sample size is not known. In the sequel, the former case is considered in the context when the information available involves the 'smallest' observations only. The case where the size of the complete sample is unknown is still being investigated. The problem is formally stated in ? 2, and ?? 3 and 4 contain a discussion of the maximum likelihood estimation of the parameters and some attendant issues. ? 5 presents some special results for the case of estimation using the complete sample. Numerical approximations used are given in Appendix A. Tables to facilitate the solution of the maximumlikelihood equations are given in Appendix B. ? 6 presents some examples of the use of these tables. The case when the origin parameter is also unknown is briefly considered in ?7.

85 citations

Journal ArticleDOI
TL;DR: In this paper, the joint estimation of kinetic (diffusion) and equilibrium (solubility) properties of gas-polymer systems is reported for a closed, isothermal system as gas diffuses radially into a cylindrical mass of polymer held in a sintered steel or glass container.
Abstract: Techniques and results are reported for the joint estimation of kinetic (diffusion) and equilibrium (solubility) properties of gas—polymer systems. Pressure decreasing with time is measured in a closed, isothermal system as gas diffuses radially into a cylindrical mass of polymer held in a sintered steel or glass container. Polyethylene and polystyrene with various gases have been studied at temperatures up to 227°C. and pressures to 650. Gas pressures are raised “incrementally” to minimize (and to study) the dependence of diffusivities on concentration and pressure and to improve solubility estimation. Data collection techniques have evolved from hand recording of time, temperature, and pressure to use of an automatic analog and digital recording system. Pressures are converted to moles by use of an empirical equation of state based on published P—V—T data. Fourier's equation for radial diffusion from a closed well-mixed solution into an infinite cylinder and some approximations thereto serve as a frame of reference for analysis of the data. Least squares estimates of the diffusivity and solubility are obtained by a numerical iteration process on an electronic computer. Some of the effects of departures from applicability of Fourier's equation are overcome by extrapolating series of estimetes of diffusivities and solubilities obtained from systematically censoring the data.

43 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A generalization of the sampling method introduced by Metropolis et al. as mentioned in this paper is presented along with an exposition of the relevant theory, techniques of application and methods and difficulties of assessing the error in Monte Carlo estimates.
Abstract: SUMMARY A generalization of the sampling method introduced by Metropolis et al. (1953) is presented along with an exposition of the relevant theory, techniques of application and methods and difficulties of assessing the error in Monte Carlo estimates. Examples of the methods, including the generation of random orthogonal matrices and potential applications of the methods to numerical problems arising in statistics, are discussed. For numerical problems in a large number of dimensions, Monte Carlo methods are often more efficient than conventional numerical methods. However, implementation of the Monte Carlo methods requires sampling from high dimensional probability distributions and this may be very difficult and expensive in analysis and computer time. General methods for sampling from, or estimating expectations with respect to, such distributions are as follows. (i) If possible, factorize the distribution into the product of one-dimensional conditional distributions from which samples may be obtained. (ii) Use importance sampling, which may also be used for variance reduction. That is, in order to evaluate the integral J = X) p(x)dx = Ev(f), where p(x) is a probability density function, instead of obtaining independent samples XI, ..., Xv from p(x) and using the estimate J, = Zf(xi)/N, we instead obtain the sample from a distribution with density q(x) and use the estimate J2 = Y{f(xj)p(x1)}/{q(xj)N}. This may be advantageous if it is easier to sample from q(x) thanp(x), but it is a difficult method to use in a large number of dimensions, since the values of the weights w(xi) = p(x1)/q(xj) for reasonable values of N may all be extremely small, or a few may be extremely large. In estimating the probability of an event A, however, these difficulties may not be as serious since the only values of w(x) which are important are those for which x -A. Since the methods proposed by Trotter & Tukey (1956) for the estimation of conditional expectations require the use of importance sampling, the same difficulties may be encountered in their use. (iii) Use a simulation technique; that is, if it is difficult to sample directly from p(x) or if p(x) is unknown, sample from some distribution q(y) and obtain the sample x values as some function of the corresponding y values. If we want samples from the conditional dis

14,965 citations

Journal ArticleDOI
TL;DR: In this article, the authors developed a geographical information system to identify Koppen's climate types based on monthly temperature and rainfall data from 2,950 weather stations in Brazil, and the results are presented as maps, graphs, diagrams and tables, allowing users to interpret the occurrence of climate types in Brazil.
Abstract: Koppen's climate classification remains the most widely used system by geographical and climatological societies across the world, with well recognized simple rules and climate symbol letters. In Brazil, climatology has been studied for more than 140 years, and among the many proposed methods Koppen 0 s system remains as the most utilized. Considering Koppen's climate classification importance for Brazil (geography, biology, ecology, meteorology, hydrology, agronomy, forestry and environmental sciences), we developed a geographical information system to identify Koppen's climate types based on monthly temperature and rainfall data from 2,950 weather stations. Temperature maps were spatially described using multivariate equations that took into account the geographical coordinates and altitude; and the map resolution (100 m) was similar to the digital elevation model derived from Shuttle Radar Topography Mission. Patterns of rainfall were interpolated using kriging, with the same resolution of temperature maps. The final climate map obtained for Brazil (851,487,700 ha) has a high spatial resolution (1 ha) which allows to observe the climatic variations at the landscape level. The results are presented as maps, graphs, diagrams and tables, allowing users to interpret the occurrence of climate types in Brazil. The zones and climate types are referenced to the most important mountains, plateaus and depressions, geographical landmarks, rivers and watersheds and major cities across the country making the information accessible to all levels of users. The climate map not only showed that the A, B and C zones represent approximately 81%, 5% and 14% of the country but also allowed the identification of Koppen's climates types never reported before in Brazil.

7,134 citations

Journal ArticleDOI
TL;DR: In this article, the authors developed measures of multivariate skewness and kurtosis by extending certain studies on robustness of the t statistic, and the asymptotic distributions of the measures for samples from a multivariate normal population are derived and a test for multivariate normality is proposed.
Abstract: SUMMARY Measures of multivariate skewness and kurtosis are developed by extending certain studies on robustness of the t statistic. These measures are shown to possess desirable properties. The asymptotic distributions of the measures for samples from a multivariate normal population are derived and a test of multivariate normality is proposed. The effect of nonnormality on the size of the one-sample Hotelling's T2 test is studied empirically with the help of these measures, and it is found that Hotelling's T2 test is more sensitive to the measure of skewness than to the measure of kurtosis. measures have proved useful (i) in selecting a member of a family such as from the Karl Pearson family, (ii) in developing a test of normality, and (iii) in investigating the robustness of the standard normal theory procedures. The role of the tests of normality in modern statistics has recently been summarized by Shapiro & Wilk (1965). With these applications in mind for the multivariate situations, we propose measures of multivariate skewness and kurtosis. These measures of skewness and kurtosis are developed naturally by extending certain aspects of some robustness studies for the t statistic which involve I1 and 32. It should be noted that measures of multivariate dispersion have been available for quite some time (Wilks, 1932, 1960; Hotelling, 1951). We deal with the measure of skewness in ? 2 and with the measure of kurtosis in ? 3. In ? 4 we give two important applications of these measures, namely, a test of multivariate normality and a study of the effect of nonnormality on the size of the one-sample Hotelling's T2 test. Both of these problems have attracted attention recently. The first problem has been treated by Wagle (1968) and Day (1969) and the second by Arnold (1964), but our approach differs from theirs.

3,774 citations

Journal ArticleDOI
TL;DR: In this paper, a practical guide to goodness-of-fit tests using statistics based on the empirical distribution function (EDF) is presented, and five of the leading statistics are examined.
Abstract: This article offers a practical guide to goodness-of-fit tests using statistics based on the empirical distribution function (EDF). Five of the leading statistics are examined—those often labelled D, W 2, V, U 2, A 2—and three important situations: where the hypothesized distribution F(x) is completely specified and where F(x) represents the normal or exponential distribution with one or more parameters to be estimated from the data. EDF statistics are easily calculated, and the tests require only one line of significance points for each situation. They are also shown to be competitive in terms of power.

2,890 citations

Journal ArticleDOI
TL;DR: In this paper, the Lagrange multiplier procedure or score test on the Pearson family of distributions was used to obtain tests for normality of observations and regression disturbances, and the tests suggested have optimum asymptotic power properties and good finite sample performance.
Abstract: Summary Using the Lagrange multiplier procedure or score test on the Pearson family of distributions we obtain tests for normality of observations and regression disturbances. The tests suggested have optimum asymptotic power properties and good finite sample performance. Due to their simplicity they should prove to be useful tools in statistical analysis.

2,796 citations