The main intent of this paper is to introduce a new statistical procedure for testing a complete sample for normality. The test statistic is obtained by dividing the square of an appropriate linear combination of the sample order statistics by the usual symmetric estimate of variance. This ratio is both scale and origin invariant and hence the statistic is appropriate for a test of the composite hypothesis of normality. Testing for distributional assumptions in general and for normality in particular has been a major area of continuing statistical research-both theoretically and practically. A possible cause of such sustained interest is that many statistical procedures have been derived based on particular distributional assumptions-especially that of normality. Although in many cases the techniques are more robust than the assumptions underlying them, still a knowledge that the underlying assumption is incorrect may temper the use and application of the methods. Moreover, the study of a body of data with the stimulus of a distributional test may encourage consideration of, for example, normalizing transformations and the use of alternate methods such as distribution-free techniques, as well as detection of gross peculiarities such as outliers or errors. The test procedure developed in this paper is defined and some of its analytical properties described in ? 2. Operational information and tables useful in employing the test are detailed in ? 3 (which may be read independently of the rest of the paper). Some examples are given in ? 4. Section 5 consists of an extract from an empirical sampling study of the comparison of the effectiveness of various alternative tests. Discussion and concluding remarks are given in ?6. 2. THE W TEST FOR NORMALITY (COMPLETE SAMPLES) 2 1. Motivation and early work This study was initiated, in part, in an attempt to summarize formally certain indications of probability plots. In particular, could one condense departures from statistical linearity of probability plots into one or a few 'degrees of freedom' in the manner of the application of analysis of variance in regression analysis? In a probability plot, one can consider the regression of the ordered observations on the expected values of the order statistics from a standardized version of the hypothesized distribution-the plot tending to be linear if the hypothesis is true. Hence a possible method of testing the distributional assumptionis by means of an analysis of variance type procedure. Using generalized least squares (the ordered variates are correlated) linear and higher-order

/pdf/an-analysis-of-variance-test-for-normality-complete-samples-2h979ba301.pdf

An Analysis of Variance Test for Normality (Complete Samples)

The problem of identifying differentially expressed genes in designed microarray experiments is considered. Lonnstedt and Speed (2002) derived an expression for the posterior odds of differential expression in a replicated two-color experiment using a simple hierarchical parametric model. The purpose of this paper is to develop the hierarchical model of Lonnstedt and Speed (2002) into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples. The model is reset in the context of general linear models with arbitrary coefficients and contrasts of interest. The approach applies equally well to both single channel and two color microarray experiments. Consistent, closed form estimators are derived for the hyperparameters in the model. The estimators proposed have robust behavior even for small numbers of arrays and allow for incomplete data arising from spot filtering or spot quality weights. The posterior odds statistic is reformulated in terms of a moderated t-statistic in which posterior residual standard deviations are used in place of ordinary standard deviations. The empirical Bayes approach is equivalent to shrinkage of the estimated sample variances towards a pooled estimate, resulting in far more stable inference when the number of arrays is small. The use of moderated t-statistics has the advantage over the posterior odds that the number of hyperparameters which need to estimated is reduced; in particular, knowledge of the non-null prior for the fold changes are not required. The moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom. The moderated t inferential approach extends to accommodate tests of composite null hypotheses through the use of moderated F-statistics. The performance of the methods is demonstrated in a simulation study. Results are presented for two publicly available data sets.

/pdf/linear-models-and-empirical-bayes-methods-for-assessing-2yc9s2cjo3.pdf

Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments

Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution—the part of the distribution representing large but rare events—and by the difficulty of identifying the range over which power-law behavior holds. Commonly used methods for analyzing power-law data, such as least-squares fitting, can produce substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Here we present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. Our approach combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov-Smirnov (KS) statistic and likelihood ratios. We evaluate the effectiveness of the approach with tests on synthetic data and give critical comparisons to previous approaches. We also apply the proposed methods to twenty-four real-world data sets from a range of different disciplines, each of which has been conjectured to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data, while in others the power law is ruled out.

/pdf/power-law-distributions-in-empirical-data-4dybihkltl.pdf

Power-Law Distributions in Empirical Data

Models for the analysis of longitudinal data must recognize the relationship between serial observations on the same unit. Multivariate models with general covariance structure are often difficult to apply to highly unbalanced data, whereas two-stage random-effects models can be used easily. In two-stage models, the probability distributions for the response vectors of different individuals belong to a single family, but some random-effects parameters vary across individuals, with a distribution specified at the second stage. A general family of models is discussed, which includes both growth models and repeated-measures models as special cases. A unified approach to fitting these models, based on a combination of empirical Bayes and maximum likelihood estimation of model parameters and using the EM algorithm, is discussed. Two examples are taken from a current epidemiological study of the health effects of air pollution.

Random-effects models for longitudinal data

I use a new technique to derive a closed-form solution for the price of a European call option on an asset with stochastic volatility. The model allows arbitrary correlation between volatility and spotasset returns. I introduce stochastic interest rates and show how to apply the model to bond options and foreign currency options. Simulations show that correlation between volatility and the spot asset’s price is important for explaining return skewness and strike-price biases in the BlackScholes (1973) model. The solution technique is based on characteristi c functions and can be applied to other problems.

/pdf/a-closed-form-solution-for-options-with-stochastic-2mnz9ol71o.pdf

A Closed-Form Solution for Options with Stochastic Volatility with Applications to Bond and Currency Options

Continuous Distributions (General) Normal Distributions Lognormal Distributions Inverse Gaussian (Wald) Distributions Cauchy Distribution Gamma Distributions Chi-Square Distributions Including Chi and Rayleigh Exponential Distributions Pareto Distributions Weibull Distributions Abbreviations Indexes

Continuous univariate distributions

Preface. 1. Preliminary Information. 2. Families of Discrete Distributions. 3. Binomial Distributions. 4. Poisson Distributions. 5. Neggative Binomial Distributions. 6. Hypergeometric Distributions. 7. Logarithmic and Lagrangian Distributions. 8. Mixture Distributions. 9. Stopped-Sum Distributions. 10. Matching, Occupancy, Runs, and q-Series Distributions. 11. Parametric Regression Models and Miscellanea. Bibliography. Abbreviations. Index.

Norman L. Johnson

Papers

Continuous univariate distributions

Univariate Discrete Distributions

Systems of frequency curves generated by methods of translation.

Distributions in Statistics: Continuous Multivariate Distributions.

Univariate Discrete Distributions.