scispace - formally typeset
Search or ask a question

Showing papers in "Annals of Mathematical Statistics in 1963"




Journal ArticleDOI
TL;DR: In this paper, the asymptotic distribution of the characteristic roots and vectors of a sample covariance matrix is given when the observations are from a multivariate normal distribution whose covariance matrices has characteristic roots of arbitrary multiplicity.
Abstract: : The asymptotic distribution of the characteristic roots and (normalized) vectors of a sample covariance matrix is given when the observations are from a multivariate normal distribution whose covariance matrix has characteristic roots of arbitrary multiplicity. The elements of each characteristic vector are the coefficients of a principal component (with sum of squares of coefficients being unity), and the corresponding characteristic root is the variance of the principal component. Tests of hypotheses of equality of population roots are treated, and confidence intervals for assumed equal roots are given; these are useful in assessing the importance of principal components. A similar study for correlation matrices is considered. (Author)

1,240 citations


Journal ArticleDOI
TL;DR: Rank tests such as the two Wilcoxon tests or the Kruskal-Wallis H-test have been shown to be more robust against gross errors than that of the t-and F-tests, even in the rare case in which the suspicion of the possibility of gross errors is unfounded.
Abstract: A serious objection to many of the classical statistical methods based on linear models or normality assumptions is their vulnerability to gross errors. For certain testing problems this difficulty is suc-cessfully overcome by rank tests such as the two Wilcoxon tests or the Kruskal- Wallis H-test. Their power is more robust against gross errors than that of the t- and F-tests, and their efficiency loss is quite small even in the rare case in which the suspicion of the possibility of gross errors is unfounded.

1,086 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that the class of all mixtures of a one-parameter additively-closed family of distributions is identifiable, and the identifiability of all finite mixtures (either of normal distributions or of binomial distributions) is shown.
Abstract: In general, the class of mixtures of the family of normal distributions or of Gamma (Type III) distributions or binomial distributions is not identifiable (see [3], [4] or Section 2 below for the meaning of this statement). In [4] it was shown that the class of all mixtures of a one-parameter additively-closed family of distributions is identifiable. Here, attention will be confined to finite mixtures and a theorem will be proved yielding the identifiability of all finite mixtures of Gamma (or of normal) distributions. Thus, estimation of the mixing distribution on the basis of observations from the mixture is feasible in these cases. Some separate results on identifiability of finite mixtures of binomial distributions also appear.

502 citations


Journal ArticleDOI
TL;DR: In this article, it was shown that the posterior probability converges to point mass at the true parameter value among almost all sample sequences (for short, the posterior is consistent; see Definition 1) exactly for parameter values in the topological carrier of the prior.
Abstract: Doob (1949) obtained a very general result on the consistency of Bayes' estimates. Loosely, if any consistent estimates are available, then the Bayes' estimates are consistent for almost all values of the parameter under the prior measure. If the parameter is thought of as being selected by nature through a random mechanism whose probability law is known, Doob's result is completely satisfactory. On the other hand, in some circumstances it is necessary to identify the exceptional null set. For example, if the parameter is thought of as fixed but unknown, and the prior measure is chosen as a convenient way to calculate estimates, it is important to know for which null set the method fails. In particular, it is desirable to choose the prior so that the null set is in fact empty. The problem is very delicate; considerable work [8], [9], [12] has been done on it recently, in quite general contexts and under severe regularity assumptions. It might therefore be of interest to discuss the simplest possible case, that of independent, identically distributed, discrete observations, in some detail. This will be done in Sections 3 and 4 when the observations take a finite set of possible values. Under this assumption, Section 3 shows that the posterior probability converges to point mass at the true parameter value among almost all sample sequences (for short, the posterior is consistent; see Definition 1) exactly for parameter values in the topological carrier of the prior. In Section 4, the asymptotic normality of the posterior is shown to follow from a local smoothness assumption about the prior. In both sections, results are obtained for priors which admit the possibility of an infinite number of states. The results of these sections are not entirely new; see pp. 333 ff. of [7], pp. 224 ff. of [10], [11]. They have not appeared in the literature, to the best of our knowledge, in a form as precise as Theorems 1, 3, 4. Theorem 2 is essentially the relevant special case of Theorem 7.4 of Schwartz (1961). In Sections 5 and 6, the case of a countable number of possible values is treated. We believe the results to be new. Here the general problem appears, because priors which assign positive mass near the true parameter value may lead to ridiculous estimates. The results of Section 3 (let alone 4) are false. In fact, Theorem 5 of Section 5 gives the following construction. Suppose that under the true parameter value the observations take an infinite number of values with positive probability. Then given any spurious (sub-)stochastic probability distribution, it is possible to find a prior assigning positive mass to any neighborhood of the true parameter value, but leading to a posterior probability which converges for almost all sample sequences to point mass at the spurious distribution. Indeed, there is a prior assigning positive mass to every open set of parameters, for which the posterior is consistent only at a set of parameters of the first category. To some extent, this happens because at any stage information about a finite number of stages only is available, but on the basis of this evidence, conclusions must be drawn about all states. If the prior measure has a serious prejudice about the shape of the tails, disaster ensues. In Section 6, it is shown that a simple condition on the prior measure (which serves to limit this prejudice) ensures the consistency of the posterior. Prior probabilities leading to posterior distributions consistent at all and asymptotically normal at essentially all (see Remark 3, Section 3) parameter values are constructed. Section 5 is independent of Sections 3 and 4; Section 6 is not. Section 6 overlaps to some extent with unpublished work of Kiefer and Wolfowitz; it has been extended in certain directions by Fabius (1963). The results of this paper were announced in [5]; some related work for continuous state space is described in [3]. It is a pleasure to thank two very helpful referees: whatever expository merit Section 5 has is due to them and to L. J. Savage.

421 citations


Journal ArticleDOI
TL;DR: In this paper, the hazard rate is derived from its probabilistic interpretation: if, for example, F is a life distribution, q(x)dx is the conditional probability of death in (x, x + dx) given survival to age x.
Abstract: : Properties of distribution functions F (or their densities f) are related to properties of the corres onding hazard rate q defined by q(x) equals f(x)/ 1 - F(x) . Interest in the hazard rate is derived from its probabilistic interpretation: if, for example, F is a life distribution, q(x)dx is the conditional probability of death in (x, x + dx) given survival to age x. Because of this interpretation f is assumed to be the density of a positive random variable, although for many of the results this is not necessary. The hazard rate is important in a number of applications, and is known by a variety of names. It is used by actuaries under the name of force of mortality to compute mortality tables, and its reciprocal is known to statisticians as Mill's ratio. In the analysis of extreme value distributions it is called the intensity function, and in reliability theory it is usually referred to as the failure rate. A number of general results are obtained, but particular attention is paid to densities with monotone hazard rate. (Autho )

421 citations



Journal ArticleDOI
TL;DR: In this paper, the authors interpreted the principle of maximum entropy, together with some generalizations, as a heuristic principle for the generation of null hypotheses for the main application of $m$-dimensional population contingency tables, with the marginal totals given down to dimension $m - r$ ("restraints of the $r$th order").
Abstract: The principle of maximum entropy, together with some generalizations, is interpreted as a heuristic principle for the generation of null hypotheses. The main application is to $m$-dimensional population contingency tables, with the marginal totals given down to dimension $m - r$ ("restraints of the $r$th order"). The principle then leads to the null hypothesis of no "$r$th-order interaction." Significance tests are given for testing the hypothesis of no $r$th-order or higher-order interaction within the wider hypothesis of no $s$th-order or higher-order interaction, some cases of which have been treated by Bartlett and by Roy and Kastenbaum. It is shown that, if a complete set of $r$th-order restraints are given, then the hypothesis of the vanishing of all $r$th-order and higher-order interactions leads to a unique set of cell probabilities, if the restraints are consistent, but not only just consistent. This confirms and generalizes a recent conjecture due to Darroch. A kind of duality between maximum entropy and maximum likelihood is proved. Some relationships between maximum entropy, interactions, and Markov chains are proved.

407 citations



Journal ArticleDOI
TL;DR: In this article, Rosen et al. showed that the least squares estimators of the parameters of a Gauss-Markov regression are consistent in case (a) (theorem 1) or asymptotically normal in case(b) (Theorem 2) for every regression of the respective families.
Abstract: This paper deals with linear regressions \begin{equation*}\tag{1.1}y_k = x_{k1}\beta_1 + \cdots + x_{kq}\beta_q + \epsilon_k, \quad k = 1, 2, \cdots\end{equation*} with given constants $x_{km}$ and with error random variables $\epsilon_k$ that are (a) uncorrelated or (b) independent. Let $E\epsilon_k = 0, 0 < E\epsilon^2_k < \infty$ for all $k$. The individual error distribution functions (d.f.'s) are not assumed to be known, nor need they be identical for all $k$. They are assumed, however, to be elements of a certain set $F$ of d.f.'s. Consider the family of regressions associated with the family of all the error sequences possible under these restrictions. Then conditions on the set $F$ and on the $x_{km}$ are obtained such that the least squares estimators (LSE) of the parameters $\beta_1, \cdots, \beta_q$ are consistent in Case (a) (Theorem 1) or asymptotically normal in Case (b) (Theorem 2) for every regression of the respective families. The motivation for these theorems lies in the fact that under the given assumptions statements based only on the available knowledge must always concern the regression family as a whole. It will be noticed moreover that the conditions of the theorems do not require any knowledge about the particular error sequence occurring in (1.1). Most of the conditions are necessary as well as sufficient, with the consequence that they cannot be improved upon under the limited information assumed to be available about the model. Since the conditions are very mild, the results apply to a large number of actual estimation problems. We denote by $\mathfrak{F}(F)$ the set of all sequences $\{\epsilon_k\}$ that occur in the regressions of a family as characterized above. Thus, $\mathfrak{F}(F)$ comprises all sequences of uncorrelated (Case (a)) or independent (Case (b)) random variables whose d.f.'s belong to $F$ but are not necessarily the same from term to term of the sequence. For each $G \varepsilon F$ the relations $\int x dG = 0$ and $0 < \int x^2 dG < \infty$ hold. In this paper, $\mathfrak{F}(F)$ may be looked upon as a parameter space. A parameter point then is a sequence of $\mathfrak{F}(F)$. Correspondingly, we say that a statement holds on $\mathfrak{F}(F)$ (briefly on $F$) if it holds for all $\{\epsilon_k\} \varepsilon \mathfrak{F}(F)$. The statements of Theorems 1 and 2 are of this kind. The proof of Theorem 1, as well as the proof of the sufficiency in Theorem 2, is elementary and straight forward. Theorem 2 is a special case of a central limit theorem (holding uniformly on $\mathfrak{F}(F)$) for families of random sequences [3]. Some similarity between the roles of the parameter spaces $\mathfrak{F}(F)$ in our theorems and of the parameter spaces that occur, e.g., in the Gauss-Markov and related theorems may be seen in the fact that these theorems remain true only as long as the conclusions in the theorems hold for every parameter point in the respective spaces. As is well known, the statements in the Gauss-Markov and related theorems hold for every parameter vector $\beta_1, \cdots, \beta_q$ in a $q$-dimensional vector space (see e.g. Scheffe 1959, p. 13, 14). A result in the theory of linear regressions that bears some resemblance with the theorems of this paper has been obtained by Grenander and Rosenblatt (1957, p. 244). Let the error sequence $\{\epsilon_k\}$ in (1.1) be a weakly stationary random sequence with piecewise continuous spectral density, and let the regression vectors admit a joint spectral representation. Under these assumptions Grenander and Rosenblatt give necessary and sufficient conditions for the regression spectrum and for the family of admissible spectral densities in order that the LSE are asymptotically efficient for every density of the family. In Sections 3 and 6 we discuss some examples relevant to Theorems 1 and 2.

Journal ArticleDOI
TL;DR: In this paper, the properties of such estimators are discussed on the basis of their mean integrated square errors (M.I.S.E), and the corresponding development for discrete distributions is sketched and examples are given in both continuous and discrete cases.
Abstract: Estimators of the form $\hat f_n(x) = (1/n) \sum^n_{i=1} \delta_n(x - x_i)$ of a probability density $f(x)$ are considered, where $x_1 \cdots x_n$ is a sample of $n$ observations from $f(x)$. In Part I, the properties of such estimators are discussed on the basis of their mean integrated square errors $E\lbrack\int(f_n(x) - f(x))^2dx\rbrack$ (M.I.S.E.). The corresponding development for discrete distributions is sketched and examples are given in both continuous and discrete cases. In Part II the properties of the estimator $\hat f_n(x)$ will be discussed with reference to various pointwise consistency criteria. Many of the definitions and results in both Parts I and II are analogous to those of Parzen [1] for the spectral density. Part II will appear elsewhere.

Journal ArticleDOI
TL;DR: In this paper, an asymptotic expansion of the distribution with respect to three numbers $N_1, N_2$ and $n$ representing degrees of freedom is presented.
Abstract: The distribution of the linear discriminant function $W$, Anderson's classification statistic (1951), is investigated by several authors: Bowker (1960), Bowker and Sitgreaves (1961), Sitgreaves (1952, 1961), etc. Since the exact distribution is too complicated to be used numerically, as indicated by Sitgreaves (1961), we present here an asymptotic expansion of the distribution with respect to three numbers $N_1, N_2$ and $n$ representing degrees of freedom. This is a generalization of the result of Bowker and Sitgreaves who deal with a special case where $N_1 = N_2 = N$ and $n = 2N - 2$.

Journal ArticleDOI
TL;DR: In this article, the problem of finding the geometric direction whose maximum angle with a given set of directions is least is studied, and a solution to this problem is characterized and proven unique (Sections 8, 17-20).
Abstract: Suppose a single contrast $y = \sum c_j u_j$, where $\sum c_j = 0$, is to be tested as a basis for detecting differences among unknown parameters $\mu_j$, where $y_j = \mu_j + \epsilon_j$, and the $\epsilon_j$ are independent and normally distributed with mean zero and variance $\sigma^2$. Write $\mu_j = \alpha + \beta x_j$. Then the problem is to detect $\beta eq 0$. If $\sum x_j = 0$, and $\sum x^2_j = 1$, the noncentrality of $y$, referred to its standard deviation, is $(\beta/\sigma)$ times the formal correlation coefficient $r$ between the $c_j$ and the $x_j$. If the $x_j$ are known, the $c_j$ can be chosen to make the correlation unity. If the $x_j$ are wholly unknown, no single contrast can guarantee power in detecting $\beta eq 0$. Intermediate situations, where we know something but not everything about the $x_j$, occur frequently. If our knowledge can be placed in the form of linear inequalities restricting the $\mu_j$ (equivalently the $x_j$) the problem of choosing a contrast $\{c_j\}$ which will give relatively good power against the unknown (latent) configuration $\{x_j\}$ is a relatively manageable one. The problem is to obtain a large value of $r^2$ between $\{c_j\}$ which is at our choice, and $\{x_j\}$, which is only partially known. A conservative approach is to try to select the $\{c_j\}$ so that the minimum value of $r^2$ compatible with the restrictions on $\{x_j\}$ is maximized, or nearly so. The maximization of minimum $r^2$ when response patterns are constrained by linear homogeneous inequalities leads to the mathematical problem of finding the geometric direction whose maximum angle with a given set of directions is least. The solution to this problem is characterized and proven unique (Sections 8, 17-20). No useful algorithm which is absolutely certain to reach the solution in a few steps appears to exist. However, procedures are discussed (Sections 10 and 11) which reach a solution relatively rapidly in the instances we have considered. The procedures are illustrated on selected examples (Sections 15-16). The general theory is applied (Sections 13-14) to the latent configuration defined by $x_1 \leqq x_2 \leqq x_3 \leqq \cdots \leqq x_n$, which we call simple rank order. A formula is found for the maximum contrast which maximizes minimum $r^2$, and its coefficients are given for $n \leqq 20$. The "linear-2-4" contrast, constructed from the usual linear contrast by quadrupling $c_1$ and $c_n$, and doubling $c_2$ and $c_{n-1}$, is a reasonable approximation to the maximum contrast for small or medium $n$, and its minimum $r^2$ remains above 90{\tt\%} of the maximum possible for $n \leqq 50$ (Table 2). Knowing only simple rank order for the $\mu_j$, good practice seems to indicate the use of "maximum" or "linear-2-4" contrasts in careful work. If more information or insight about the $x_j$ is available, some other contrast may be preferable.



Book ChapterDOI
TL;DR: In this paper, the authors used the two-sample Wilcoxon test to estimate the nonparametric confidence intervals for a shift parameter A, which are obtained from the 2-sample Student's 2-test.
Abstract: Exact expressions and large-sample approximations are given for the nonparametric confidence intervals for a shift parameter A, which are obtained from the two-sample Wilcoxon test. These intervals are shown to have the same asymptotic efficiency relative to the standard confidence intervals for A as the Wilcoxon test has relative to Student’s 2-test. As a consequence of this result, a constant multiple of the length of the nonparametric intervals is shown to be a consistent estimator of the quantity 1/∫f 2 (x) dx.

Journal ArticleDOI
TL;DR: In this article, the authors extended the work of Bartholomew et al. to the case of equal sample sizes, where explicit expressions for these probabilities are obtained by indicating their relationship to Sparre Andersen's [1], [2] results.
Abstract: In a one-way analysis of variance situation in which the populations are ordered under the alternative hypothesis, one desires a test that, unlike the usual normal theory $F$ test, concentrates its power on the ordered alternatives, not on any alternatives. In this paper, two contributions are made. First, under the usual normal assumptions, work by Bartholomew [4], [7] on the likelihood ratio test, when the ordering is complete under the alternative hypothesis, is extended. By suitable characterization of the partition which the likelihood ratio induces on the sample space, the likelihood ratio test is shown to depend on incomplete Beta functions and certain probabilities of the above partitions of the sample space. The major contribution in this paper is for the case of equal sample sizes, where explicit expressions for these probabilities are obtained by indicating their relationship to Sparre Andersen's [1], [2] results. Second, under the analogous nonparametric assumptions and for equal sample sizes, a parallel test based on ranks is proposed and discussed for stochastic ordering of the populations. The asymptotic Pitman efficiency of the nonparametric test relative to the test in the normal case is derived.


Journal ArticleDOI
TL;DR: In this article, a family of procedures whose integrated risk is asymptotically the same as the Bayes risk is presented, under mild assumptions, for hypothesis testing with an arbitrary form and there can be more than two decisions.
Abstract: In recent years the study of sequential procedures which are asymptotically optimum in an appropriate sense as the cost $c$ per observation goes to zero has received considerable attention On the one hand, Schwarz (1962) has recently given an interesting theory of the asymptotic shape, as $c \rightarrow 0$, of the Bayes stopping region relative to an a priori distribution $F$, for testing sequentially between two composite hypotheses $\theta \leqq \theta_1$ and $\theta \geqq \theta_2$ concerning the real parameter $\theta$ of a distribution of exponential (Koopman-Darmois) type, with indifference region the open interval $(\theta_1, \theta_2)$ (An example of Schwarz's considerations is described in connection with Figure 4) One aim of the present paper is to generalize Schwarz's results to the case where (with or without indifference regions) the distributions have arbitrary form and there can be more than two decisions (Sections 2, 3, 4) In this general setting we obtain, under mild assumptions, a family $\{\delta_c\}$ of procedures whose integrated risk is asymptotically the same as the Bayes risk (In fact, extending Schwarz's result, a family $\{\delta'_c\}$ can be constructed so as to possess this asymptotic Bayes property relative to all a priori distributions with the same support as $F$, or even with smaller indifference region support than $F$) Procedures like our $\{\delta_c\}$ have already been suggested by Wald (1947) for use in tests of composite hypotheses (eg, the sequential $t$-test), but his concern was differently inspired At the same time, we show how such multiple decision problems can be treated by using simultaneously a number of sequential tests for associated two-decision problems A second aim is to extend, strengthen, and somewhat simplify the asymptotic sequential design considerations originated by Chernoff (1959) and further developed by Albert (1961) and Bessler (1960) (Section 5) Our point of departure here is a device utilized by Wald (1951) in a simpler estimation setting, and which in the present setting amounts to taking a preliminary sample with predesignated choice of designs and such that, as $c \rightarrow 0$, the size of this preliminary sample tends to infinity, while its ratio to the total expected sample size tends to zero The preliminary sample can then be used to "guess the true state of nature" and thus to choose the future design pattern once and for all rather than to have to reexamine the choice of design after subsequent observations (In Wald's setting the only "design" problem was to pick the size of the second sample of his two-sample procedure) The properties of the resulting procedure can then be inferred from the considerations of Sections 2, 3, and 4, where there is no design problem but where most of the work in this paper is done; using Wald's idea, we thereby obtain procedures for the design problem fairly easily, once we have the (non-design) sequential inference structure to build upon The family $\{\delta^\ast_c\}$ so obtained has the same asymptotic Bayes property as that described above for the family $\{\delta_c\}$ of the non-design problem Furthermore, a family $\{\delta^{\ast\ast}_c\}$ can be constructed so that, like $\{\delta'_c\}$ in the non-design problem, it is asymptotically Bayes for all a priori distributions with the same support The value of the asymptotic Bayes risk of such a family is closely related to the lower bound which was obtained by Chernoff et al for the risk function of certain procedures, and which gives another form for the optimality statement The role of the sequential procedures considered by Donnelly (1957) and Anderson (1960) for hypothesis testing with an indifference region is indicated at the end of Section 1 Asymptotic solutions to the problem of Kiefer and Weiss (1957) are given An Appendix contains proofs of certain results on fluctuations of partial sums of independent random variables, which are used in the body of the paper

Book ChapterDOI
TL;DR: In this article, the asymptotic efficiency of these estimates relative to the standard least squares estimates, as the number of observations in each cell gets large, is shown to be the same as the Pitman efficiency of the Wilcoxon test relative to t-test.
Abstract: In linear models with several observations per cell, estimates of all contrasts are given whose small and large sample behaviour is analogous to that of the estimate of a shift parameter proposed in [2]. In particular, the asymptotic efficiency of these estimates relative to the standard least squares estimates, as the number of observations in each cell gets large, is shown to be the same as the Pitman efficiency of the Wilcoxon test relative to the t-test.

Journal ArticleDOI
TL;DR: In this article, the equivalence of rank tests for comparing dispersion has been established and it is shown that any rank test is consistent against differences in dispersion if the two distributions have a common median and differ in a scale parameter, and under some less restrictive circumstances.
Abstract: In a recent paper [1] Ansari and Bradley have shown the equivalence of two rank tests for comparing dispersion, one test due to Barton and David [2], the other to Ansari and Freund, and have provided tables of the exact distribution. They observe that Siegel and Tukey have proposed [11] a similar test which permits use of existing tables. They also exhibit the mean of the limiting normal distribution under the alternative hypothesis. Later Klotz [7] established the equivalence of all these tests. In the present paper it is shown that (1) Any of these tests is consistent against differences in dispersion if the two distributions have a common median and differ in a scale parameter, and under some less restrictive circumstances. But without such restrictions bizarre asymptotic behavior can arise--including good sensitivity against translation for some non-symmetric densities. One (not very natural) example is offered in which the test constructed for rejection if one of two scale parameters is the larger, actually turns out to be consistent against that parameter's being the smaller of the two. (2) No rank test (i.e., a test invariant under strictly increasing transformation of the scale) can hope to be a satisfactory test against dispersion alternatives without some sort of strong restrictions (e.g., equal or known medians) being placed on the class of admissible distribution pairs. (3) Box [3] has proposed testing equality of variances by applying the $t$ test to the logarithms of variances computed within small subgroups. He indicates how such tests should be robust (though not of exact size). Distribution free tests of exact size can be constructed by applying a rank test in place of the $t$ test. Wilcoxon's test applied to variances-within-triads has asymptotic efficiency .5 against normal alternatives. If the two samples each have 9 observations then the exact power is readily calculated and "efficiency" is again about .5.

Journal ArticleDOI
TL;DR: In this article, the authors give a brief summary of the current state of research for the topic in question and to indicate certain recent contributions to this topic due to the author, and a proof of the sufficiency of a simple condition for stability almost surely of the maximal order statistic.
Abstract: the theory of limit behaviour of extreme order statistics. Most of the paper is devoted to the discussion of limit distributions and of stability properties for order statistics of independent random variables. While the situation here is already well explored, very little is known in the case of dependent random variables. Those results that are known for the dependent case have been obtained in the last few years. In Section 1 we introduce notations and definitions and state a few elementary facts concerning the distributions of the set of order statistics corresponding to a set of independent, identically distributed random variables. Throughout Sections 2, 3 and 4 we assume independence of the basic variables. In Section 2 we deal with limit distributions, while in Sections 3 and 4 we deal with stability in probability and stability almost surely. Finally, in Section 5 we turn to the dependence case, summarizing some recent results due mainly to Berman [3], [4] and [5]. Our principal result is contained in Section 4; it is a proof of the sufficiency of a simple condition for stability almost surely of the maximal order statistic. That condition was introduced and studied by Geifroy [7]. The aim in writing this paper has been twofold: to give a brief summary of the current state of research for the topic in question and to indicate certain recent contributions to this topic due to the author. A number of contributions are not mentioned; these are noted in the references. I want to thank Professor Glen Baxter for the stimulating interest he has shown in this work. 1. Preliminaries. Let X1, X2, * , X., * , be a sequence of random vari





Journal ArticleDOI
TL;DR: In this paper, the lower bound on the number of integrals to be evaluated in order to know the first, second and mixed (linear) moments of the normal order statistics was obtained.
Abstract: The main purpose of the paper is to obtain the lower bound on the number of integrals to be evaluated in order to know the first, second and mixed (linear) moments of the normal order statistics ().S.) in a sample of size $N$ assuming that these moments are available for sample sizes less than $N$. Towards this, the recurrence relationships, identities, etc. among the moments of the normal order statistics, which have appeared in the literature have been collected with appropriate references. Also, these formulae are listed and stated in the most general form wherever possible. Simple and alternate proofs of some of these formulae are given. These results are also supplemented with new formulae or relationships. It is shown that it is sufficient to evaluate at most one single integral and ($N$-2)/2$ double integrals when $N$ is even and one single integral and ($N$-3)/2$ double integrals when $N$ is odd, in order to know the first, second and mixed (linear) moments of normal O.S. However, for these moments of O.S. in samples drawn from an arbitrary population symmetric about zero, one has to evaluate one more double integral in addition to the number of integrals required in the case of normal O.S. Also, a possible scheme of computing these moments which will be useful especially for small sample sizes, is presented in Section 5. The lower moments of quasi-ranges in samples drawn from an arbitrary population symmetric about zero are expressed in terms of the moments of the0 corresponding O.S. Simple recurrence formulae among the expected values of quasi-ranges in samples drawn from an arbitrary continuous population are obtained. A modest list of references is provided at the end which is by no means exhaustive.


Journal ArticleDOI
TL;DR: In this paper, the authors present an analogue to Thompson's distribution in case the underlying distribution of a sample is exponential (the exponential model is nowadays widely used in Failure and Queuing Theories), which makes it possible to obtain minimum variance unbiased estimates of functions of the parameters of the exponential distribution.
Abstract: In case the underlying distribution of a sample is normal, a substantial literature has been devoted to the distribution of quantities such as $(X_{(i)} - u)/v$ and $(X_{(i)} - u)/w$, where $X_{(i)}$ denotes the $i$th ordered observation, $u$ and $v$ are location and scale statistics of the sample, or one is a location or scale parameter and $w$ is an independent scale statistic. The case $i = 1$ or $n$ has been frequently studied in view of the great importance of extreme values in physical phenomena and also with a view to testing outlying observations or the normality of the distribution. Bibliographical references will be found in Savage [10] and, as far as the general problem of testing outliers is concerned, in Ferguson [4]; references to recent literature include Dixon [1], [2], Grubbs [5], Pillai and Tienzo [9]. Thompson [12] has studied the distribution of $(X_i, - \bar{X})/s$ where $X_i$ is one observation picked at random among the sample, and this statistic has been used in the study of outliers; Laurent has generalized Thompson's distribution to the case of a subsample picked at random among a sample [7], then to the multivariate case and the general linear hypothesis [8]. Thompson's distribution is not only the marginal distribution of $(X_i - \bar{X}/s$ but its conditional distribution, given the sufficient statistic $(\bar{X}, s)$, hence it provides the distribution of $X_i$ given $\bar{X}, s$, and, using the Rao-Blackwell-Lehmann-Scheffe theorem, gives a way of obtaining a minimum variance unbiased estimate of any estimable function of the parameters of a normal distribution for which an unbiased estimate depending on one observation is available, a fact that has been exploited in sampling inspection by variable. The present paper presents an analogue to Thompson's distribution in case the underlying distribution of a sample is exponential (the exponential model is nowadays widely used in Failure and Queuing Theories). Such a distribution makes it possible to obtain minimum variance unbiased estimates of functions of the parameters of the exponential distribution. Here an estimate is provided for the survival function $P(X > x) = S(x)$ and its powers. As an application of these results the probability distribution of the "reduced" $i$th ordered observation in a sample and that of the reduced range are derived. For possible applications to testing outliers or exponentially the reader is invited to refer to the bibliography.