scispace - formally typeset
Search or ask a question
Journal ArticleDOI

An Analysis of Variance Test for Normality (Complete Samples)

S. S. Shapiro1, M. B. Wilk1
01 Dec 1965-Biometrika (Oxford University Press)-Vol. 52, pp 591-611
TL;DR: In this article, a new statistical procedure for testing a complete sample for normality is introduced, which is obtained by dividing the square of an appropriate linear combination of the sample order statistics by the usual symmetric estimate of variance.
Abstract: The main intent of this paper is to introduce a new statistical procedure for testing a complete sample for normality. The test statistic is obtained by dividing the square of an appropriate linear combination of the sample order statistics by the usual symmetric estimate of variance. This ratio is both scale and origin invariant and hence the statistic is appropriate for a test of the composite hypothesis of normality. Testing for distributional assumptions in general and for normality in particular has been a major area of continuing statistical research-both theoretically and practically. A possible cause of such sustained interest is that many statistical procedures have been derived based on particular distributional assumptions-especially that of normality. Although in many cases the techniques are more robust than the assumptions underlying them, still a knowledge that the underlying assumption is incorrect may temper the use and application of the methods. Moreover, the study of a body of data with the stimulus of a distributional test may encourage consideration of, for example, normalizing transformations and the use of alternate methods such as distribution-free techniques, as well as detection of gross peculiarities such as outliers or errors. The test procedure developed in this paper is defined and some of its analytical properties described in ? 2. Operational information and tables useful in employing the test are detailed in ? 3 (which may be read independently of the rest of the paper). Some examples are given in ? 4. Section 5 consists of an extract from an empirical sampling study of the comparison of the effectiveness of various alternative tests. Discussion and concluding remarks are given in ?6. 2. THE W TEST FOR NORMALITY (COMPLETE SAMPLES) 2 1. Motivation and early work This study was initiated, in part, in an attempt to summarize formally certain indications of probability plots. In particular, could one condense departures from statistical linearity of probability plots into one or a few 'degrees of freedom' in the manner of the application of analysis of variance in regression analysis? In a probability plot, one can consider the regression of the ordered observations on the expected values of the order statistics from a standardized version of the hypothesized distribution-the plot tending to be linear if the hypothesis is true. Hence a possible method of testing the distributional assumptionis by means of an analysis of variance type procedure. Using generalized least squares (the ordered variates are correlated) linear and higher-order

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: In this article, the authors developed a geographical information system to identify Koppen's climate types based on monthly temperature and rainfall data from 2,950 weather stations in Brazil, and the results are presented as maps, graphs, diagrams and tables, allowing users to interpret the occurrence of climate types in Brazil.
Abstract: Koppen's climate classification remains the most widely used system by geographical and climatological societies across the world, with well recognized simple rules and climate symbol letters. In Brazil, climatology has been studied for more than 140 years, and among the many proposed methods Koppen 0 s system remains as the most utilized. Considering Koppen's climate classification importance for Brazil (geography, biology, ecology, meteorology, hydrology, agronomy, forestry and environmental sciences), we developed a geographical information system to identify Koppen's climate types based on monthly temperature and rainfall data from 2,950 weather stations. Temperature maps were spatially described using multivariate equations that took into account the geographical coordinates and altitude; and the map resolution (100 m) was similar to the digital elevation model derived from Shuttle Radar Topography Mission. Patterns of rainfall were interpolated using kriging, with the same resolution of temperature maps. The final climate map obtained for Brazil (851,487,700 ha) has a high spatial resolution (1 ha) which allows to observe the climatic variations at the landscape level. The results are presented as maps, graphs, diagrams and tables, allowing users to interpret the occurrence of climate types in Brazil. The zones and climate types are referenced to the most important mountains, plateaus and depressions, geographical landmarks, rivers and watersheds and major cities across the country making the information accessible to all levels of users. The climate map not only showed that the A, B and C zones represent approximately 81%, 5% and 14% of the country but also allowed the identification of Koppen's climates types never reported before in Brazil.

7,134 citations


Cites methods from "An Analysis of Variance Test for No..."

  • ...For all months, an asymmetric distribution was found which required data transformation, since pvalues were close to zero by normality test (SHAPIRO and WILK, 1965)....

    [...]

  • ...Normality hypothesis was tested according to the W test at 5% (SHAPIRO and WILK, 1965)....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors developed measures of multivariate skewness and kurtosis by extending certain studies on robustness of the t statistic, and the asymptotic distributions of the measures for samples from a multivariate normal population are derived and a test for multivariate normality is proposed.
Abstract: SUMMARY Measures of multivariate skewness and kurtosis are developed by extending certain studies on robustness of the t statistic. These measures are shown to possess desirable properties. The asymptotic distributions of the measures for samples from a multivariate normal population are derived and a test of multivariate normality is proposed. The effect of nonnormality on the size of the one-sample Hotelling's T2 test is studied empirically with the help of these measures, and it is found that Hotelling's T2 test is more sensitive to the measure of skewness than to the measure of kurtosis. measures have proved useful (i) in selecting a member of a family such as from the Karl Pearson family, (ii) in developing a test of normality, and (iii) in investigating the robustness of the standard normal theory procedures. The role of the tests of normality in modern statistics has recently been summarized by Shapiro & Wilk (1965). With these applications in mind for the multivariate situations, we propose measures of multivariate skewness and kurtosis. These measures of skewness and kurtosis are developed naturally by extending certain aspects of some robustness studies for the t statistic which involve I1 and 32. It should be noted that measures of multivariate dispersion have been available for quite some time (Wilks, 1932, 1960; Hotelling, 1951). We deal with the measure of skewness in ? 2 and with the measure of kurtosis in ? 3. In ? 4 we give two important applications of these measures, namely, a test of multivariate normality and a study of the effect of nonnormality on the size of the one-sample Hotelling's T2 test. Both of these problems have attracted attention recently. The first problem has been treated by Wagle (1968) and Day (1969) and the second by Arnold (1964), but our approach differs from theirs.

3,774 citations

Journal ArticleDOI
TL;DR: In this paper, a practical guide to goodness-of-fit tests using statistics based on the empirical distribution function (EDF) is presented, and five of the leading statistics are examined.
Abstract: This article offers a practical guide to goodness-of-fit tests using statistics based on the empirical distribution function (EDF). Five of the leading statistics are examined—those often labelled D, W 2, V, U 2, A 2—and three important situations: where the hypothesized distribution F(x) is completely specified and where F(x) represents the normal or exponential distribution with one or more parameters to be estimated from the data. EDF statistics are easily calculated, and the tests require only one line of significance points for each situation. They are also shown to be competitive in terms of power.

2,890 citations


Cites background or methods or result from "An Analysis of Variance Test for No..."

  • ...The following values of men's weights in pounds, first given by Snedecor, were used by Shapiro and Wilk [17] as an illustration of a test for normality: 148, 154, 158, 160, 161, 162, 166, 170, 182, 195, 236....

    [...]

  • ...W is a statistic introduced by Shapiro and Wilk [17], and DA, W' are subsequent extensions introduced by d'Agostino [2, 3] (there called D), and Shapiro and Francia [16]....

    [...]

  • ...Shapiro and Wilk [17] also included power studies for D, W2 and A2 (there called D, CVM, WCVM) and found...

    [...]

  • ...For this last case, EDF statistics have suffered recently from comparison with the W-statistic of Shapiro and Wilk [17]; no doubt, this is because the power studies reported in that paper gave very low power to EDF statistics....

    [...]

  • ...Power studies in support of new statistics have often been skimpy; since, after all, we don't usually know the exact alternative to the null distribution, it is urged that a range of alternatives comparable to those used by Shapiro and Wilk [17] or in Table 6 of this article should always be investigated....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the Lagrange multiplier procedure or score test on the Pearson family of distributions was used to obtain tests for normality of observations and regression disturbances, and the tests suggested have optimum asymptotic power properties and good finite sample performance.
Abstract: Summary Using the Lagrange multiplier procedure or score test on the Pearson family of distributions we obtain tests for normality of observations and regression disturbances. The tests suggested have optimum asymptotic power properties and good finite sample performance. Due to their simplicity they should prove to be useful tools in statistical analysis.

2,796 citations


Cites background or methods from "An Analysis of Variance Test for No..."

  • .../#4 - (2Vir)-1]NU/0-02998598 is outside (D*, Du*), where e? is the ith order statistic of u1,..., UN; (iv) Pearson et al. (1977) R test: reject Ho if either Vbl is outside (R1L, R1~) or b2 is outside (R2L, R2U); (v) Shapiro & Wilk (1965) W test: reject Ho if W = (E a1Ne?)...

    [...]

  • ...…and omnibus tests proposed by D'Agostino & Pearson (1973), Bowman & Shenton (1975) and Pearson, D'Agostino & Bowman (1977), the analysis of variance tests of Shapiro & Wilk (1965) and Shapiro & Francia (1972), and the coordinate-dependent and invariant procedures described by Cox & Small (1978)....

    [...]

Journal ArticleDOI
TL;DR: It is argued that knowledge-based resources (applicable to discovery and exploitation of opportunities) are positively related to firm performance and that EO enhances this relationship.
Abstract: While theory suggests that management has discretion in manipulating resources in order to build competitive advantage, resource-based research has focused on the characteristics of resources, paying less attention to the relationship between those resources and the way firms are organized. In explaining performance, entrepreneurship scholars have focused on a firm’s entrepreneurial strategic orientation (EO), leaving its interrelationship with internal characteristics aside. We argue that EO captures an important aspect of the way a firm is organized. Our findings suggest that knowledge-based resources (applicable to discovery and exploitation of opportunities) are positively related to firm performance and that EO enhances this relationship. Copyright  2003 John Wiley & Sons, Ltd.

2,540 citations


Cites background from "An Analysis of Variance Test for No..."

  • ...Skewness and kurtosis statistics of the dependent variable fall well within the boundaries for normality ( Shapiro and Wilk, 1965 ), allowing parametric tests of significance....

    [...]

References
More filters
Journal ArticleDOI

6,420 citations

Journal ArticleDOI
TL;DR: The Kolmogorov test as discussed by the authors is a distribution-free test of goodness of fit that is sensitive to discrepancies at the tails of the distribution rather than near the median.
Abstract: Some (large sample) significance points are tabulated for a distribution-free test of goodness of fit which was introduced earlier by the authors. The test, which uses the actual observations without grouping, is sensitive to discrepancies at the tails of the distribution rather than near the median. An illustration is given, using a numerical example used previously by Birnbaum in illustrating the Kolmogorov test.

2,013 citations

Book
01 Jan 1954
TL;DR: This paper is based on a lecture on the “Design and Analysis of Industrial Experiments” given by Dr O. L. Davies on the 8th of May 1954 and the recent designs developed by Box for the exploration of response surfaces are briefly considered.
Abstract: Summary Design and analysis of industrial experiments This paper is based on a lecture on the “Design and Analysis of Industrial Experiments” given by Dr O. L. Davies on the 8th of May 1954 to the Industrial Section of the “Vereniging voor Statistiek”. Production, formulation, and testing are distinguished as three separate fields of chemical activity where experimental designs can be applied, and various numerical examples of such experiments are discussed in detail. They consist of a 23factorial design, a 24half replicate and a 25quarter replicate fractional factorial design, and a three-way classification. In a final section the recent designs developed by Box for the exploration of response surfaces are briefly considered.

1,035 citations

Trending Questions (1)
Residual hypothesis test?

The paper does not mention anything about a residual hypothesis test.