scispace - formally typeset
Search or ask a question

Showing papers in "Annals of Mathematical Statistics in 1961"


Journal ArticleDOI
TL;DR: In this paper, the authors present a survey of the mathematical aspects of statistical inference as it applies to finite Markov chains, the problem being to draw inferences about the transition probabilities from one long, unbroken observation on the chain.
Abstract: This paper is an expository survey of the mathematical aspects of statistical inference as it applies to finite Markov chains, the problem being to draw inferences about the transition probabilities from one long, unbroken observation $\{x_1, x_2, \cdots, x_n\}$ on the chain. The topics covered include Whittle's formula, chi-square and maximum-likelihood methods, estimation of parameters, and multiple Markov chains. At the end of the paper it is briefly indicated how these methods can be applied to a process with an arbitrary state space or a continuous time parameter. Section 2 contains a simple proof of Whittle's formula; Section 3 provides an elementary and self-contained development of the limit theory required for the application of chi-square methods to finite chains. In the remainder of the paper, the results are accompanied by references to the literature, rather than by complete proofs. As is usual in a review paper, the emphasis reflects the author's interests. Other general accounts of statistical inference on Markov processes will be found in Grenander [53], Bartlett [9] and [10], Fortet [35], and in my monograph [18]. I would like to thank Paul Meier for a number of very helpful discussions on the topics treated in this paper, particularly those of Section 3.

524 citations


Journal ArticleDOI
TL;DR: In this paper, the characteristic functions of the limiting d.f.s of a class of such test criteria are obtained, and the corresponding d. f. is tabled in the bivariate case, where the test is equivalent to one originally proposed by Hoeffding [4].
Abstract: Certain tests of independence based on the sample distribution function (d.f.) possess power properties superior to those of other tests of independence previously discussed in the literature. The characteristic functions of the limiting d.f.'s of a class of such test criteria are obtained, and the corresponding d.f. is tabled in the bivariate case, where the test is equivalent to one originally proposed by Hoeffding [4]. A discussion is included of the computational problems which arise in the inversion of characteristic functions of this type. Techniques for computing the statistics and for approximating the tail probabilities are considered.

455 citations


Journal ArticleDOI
TL;DR: In this paper, a classification of the states of a Markov Renewal process is described and studied, and the concept of regularity is introduced and characterized, and some preliminary results on Markov renewal processes and Semi-Markov processes are given.
Abstract: This paper contains the definition of and some preliminary results on Markov Renewal processes and Semi-Markov processes. The close relationship between these two types of processes is described. The concept of regularity is introduced and characterized. A classification of the states of a Markov Renewal process is described and studied.

450 citations


Journal ArticleDOI
TL;DR: In this article, the authors studied Markov renewal processes with a finite number of states and derived explicit expressions for the distribution functions of first passage times, as well as for the marginal distribution function of the corresponding Semi-Markov process.
Abstract: In this paper, Markov Renewal processes having a finite number of states are studied. Explicit expressions are derived for the distribution functions of first passage times, as well as for the marginal distribution function of the corresponding Semi-Markov process. Double generating functions are obtained for the distribution functions of the $N_j$-processes. The limiting behavior of a Markov Renewal process is discussed, the stationary probabilities being derived completely. General Markov Renewal processes are introduced, and a related stationary process is determined. Several examples are given.

397 citations


Journal ArticleDOI
TL;DR: In this article, it was shown that the group minimax rule is undemocratic in that it depends only on the different distributions for the opinions represented in those put forward by the group and not on the number of members of the group supporting each different representative.
Abstract: When a group of $k$ individuals is required to make a joint decision, it occasionally happens that there is agreement on a utility function for the problem but that opinions differ on the probabilities of the relevant states of nature. When the latter are indexed by a parameter $\theta$, to which probability density functions on some measure $\mu(\theta)$ may be attributed, suppose the $k$ opinions are given by probability density functions $p_{s1}(\theta), \cdots, p_{sk}(\theta)$. Suppose that $D$ is the set of available decisions $d$ and that the utility of $d$, when the state of nature is $\theta$, is $u(d, \theta)$. For a probability density function $p(\theta)$, write $u\lbrack d\mid p(\theta)\rbrack = \int u(d, \theta)p(\theta) d\mu(\theta)$. The Group Minimax Rule of Savage [1] would have the group select that $d$ minimising $\max_{i = 1, \cdots, k}\{\max_{d'\epsilon D} u\lbrack d' \mid p_{si}(\theta)\rbrack - u\lbrack d \mid p_{si}(\theta)\rbrack\}$. As Savage remarks ([1], p. 175), this rule is undemocratic in that it depends only on the different distributions for $\theta$ represented in those put forward by the group and not on the number of members of the group supporting each different representative. An alternative rule for choosing $d$ may be stated as follows: "Choose weights $\lambda_1, \cdots, \lambda_k (\lambda_i \geqq 0, i = 1, \cdots, k$ and $\sum^k_1 \lambda_i = 1)$; construct the pooled density function $p_{s\lambda}(\theta) = \sum^k_1 \lambda_ip_{si}(\theta);$ choose the $d$, say $d_{s\lambda}$, maximising $u\lbrack d \mid p_{s\lambda}(\theta)\rbrack$." This rule, which may be called the Opinion Pool, can be made democratic by setting $\lambda_1 = \cdots = \lambda_k = 1/k$. Where it is reasonable to suppose that there is an actual, operative probability distribution, represented by an `unknown' density function $p_a(\theta)$, it is clear that the group is then acting as if $p_a(\theta)$ were known to be $p_{s\lambda}(\theta)$. If $p_a(\theta)$ were known, it would be possible to calculate $u\lbrack d_{s\lambda} \mid p_a(\theta)\rbrack$ and $u\lbrack d_{si} \mid p_a(\theta)\rbrack$, where $d_{si}$ is the $d$ maximising $u\lbrack d \mid p_{si}(\theta)\rbrack, i = 1, \cdots, k$ and then to use these quantities to assess the effect of adopting the Opinion Pool for any given choice of $\lambda_1, \cdots, \lambda_k$. It is of general theoretical interest to examine the conditions under which \begin{equation*}\tag{1.1}u\lbrack d_{s\lambda} \mid p_a(\theta)\rbrack \geqq \min_{i = 1, \cdots, k} u\lbrack d_{si} \mid p_a(\theta)\rbrack.\end{equation*} Theorems 2.1 and 3.1 provide different sets of sufficient conditions for (1.1) to hold. Theorem 2.1 requires $k = 2$ and places a restriction on $p_a(\theta)$ (or, equivalently, on $p_{s1}(\theta)$ and $p_{s2}(\theta)$); Theorem 3.1 puts conditions on $D$ and $u(d, \theta)$ instead.

392 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider a multivariate extension of the point-biserial correlation model, in which the conditional distribution of the continuous variable is a multinomial distribution.
Abstract: A model which frequently arises from experimentation in psychology is one which contains both discrete and continuous variables. The concern in such a model may be with finding measures of association or with problems of inference on some of the parameters. In the simplest such model there is a discrete variable $x$ which takes the values 0 or 1, and a continuous variable $y$. Such a random variable $x$ is often used in psychology to denote the presence or absence of an attribute. Point-biserial correlation, which is the ordinary product-moment correlation between $x$ and $y$, has been used as a measure of association. This model, when $x$ has a binomial distribution, and the conditional distribution of $y$ for fixed $x$ is normal, was studied in some detail by Tate [13]. In the present paper, we consider a multivariate extension, in which $x = (x_0, x_1, \cdots, x_k)$ has a multinomial distribution, and the conditional distribution of $y = (y_1, \cdots, y_p)$ for fixed $x$ is multivariate normal.

366 citations


Journal ArticleDOI
TL;DR: In this paper, a new approach to regression problems using reproducing kernel Hilbert spaces is described, and the authors show the close relation between statistical communication and control theory, probabilistic (and Hilbert space) theory of stochastic processes processing finite second moments, and statistical theory of regression analysis, correlation analysis, and spectral analysis of time series.
Abstract: It may fairly be said that modern time series analysis is a subject which embraces three fields which while closely related have tended to develop somewhat independently. These fields are (i) statistical communication and control theory, (ii) the probabilistic (and Hilbert space) theory of stochastic processes processing finite second moments, and (iii) the statistical theory of regression analysis, correlation analysis, and spectral (or harmonic) analysis of time series. In this paper it is my aim to show the close relation between these fields and to summarize some recent developments. The topics discussed are (i) stationary time series and their statistical analysis, (ii) prediction theory and the Hilbert space spanned by a time series, and (iii) regression analysis of time series with known covariance function. In particular, I describe a new approach to prediction and regression problems using reproducing kernel Hilbert spaces.

355 citations


Journal ArticleDOI
TL;DR: In this paper, the class of mixtures of a one-parameter additively closed family of distributions is proved identifiable and a condition for a class of scale parameter mixtures to be identifiable is indicated and applications to Type III and uniform distributions are made.
Abstract: The class of mixtures of a one-parameter additively-closed family of distributions is proved identifiable. A condition for a class of scale parameter mixtures to be identifiable is indicated and applications to Type III and uniform distributions are made.

298 citations


Journal ArticleDOI
TL;DR: Benford and Herzel as discussed by the authors showed that the first significant digit distribution is invariant under scale change of the underlying distribution, i.e., log,o(n + 1) = 0.699.
Abstract: Introduction. It has been noticed by astute observers that well used tables of logarithms are invariably dirtier at the front than at the back. Upon reflection one is led to inquire whether there are more physical constants with low order first significant digits than high. Actual counts by Benford [2] show that not only is this the case but that it seems to be an empirical truth that whenever one has a large body of physical data, Farmer's Almanac, Census Reports, Chemical Rubber Handbook, etc., the proportion of these data with first significant digit n or less is approximately log,o(n + 1). Any reader formerly unaware of this "peculiarity" will find an actual sampling experiment wondrously tantalizing. Thus, for example, approximately 0.7 of the physical constants in the Chemical Rubber Handbook begin with 4 or less (log,o(4 + 1) = 0.699). This is to be contrasted with the widespread intuitive evaluation Aths. At least two books call attention to this peculiarity, Furlan [6] and Wallis [18], but to my knowledge there are only five published papers on the subject, Benford [2], Furry et al [7], [9], Gini [8], and Herzel [11]. The first consists of excellent empirical verifications and a discussion of the implied distribution of 2nd, 3rd, -.significant digits. The second and third put forth the thesis that the distribution of significant digits should not depend markedly on the underlying distribution, and the authors present numerical evaluations for a range of underlying distributions in support of their contention. The fourth maintains that explanation is to be sought in empiric considerations. The fifth considers three different urn models; each yields a distribution of initial digits which the author compares with log,o(n + 1). This paper is a theoretical discussion of why and to what extent this so called "abnormal law" must hold. The flavor of the results is, I think, conveyed in the following remarks. (i) The only distribution for first significant digits which is invariant under scale change of the underlying distribution is log,o(n + 1). Contrary to suspicion this is a non-trivial mathematical result, for the variable n is discrete. (ii) Suppose one has a horizontal circular disc of unit circumference which is pivoted at the center. Let the disc be given a random angular displacement o where oo < 0 < oo. If the final position of the disc mod one is called so, i.e.,

277 citations




Journal ArticleDOI
TL;DR: In this article, it was shown that the sufficient and necessary condition for asymptotic normality of the rank-test vector S is of Lindeberg type (see Section 6.1).
Abstract: Let $(R_{ u 1}, \cdots, R_{{ u N}_ u})$ be a random vector which takes on the $N_ u!$ permutations of $(1, \cdots, N_ u)$ with equal probabilities. Let $\{b_{ u i}, 1 \leqq i \leqq N_ u, v \geqq 1\}$ and $\{a_{ u i}, 1 \leqq i \leqq N_ u, v \geqq 1\}$ be double sequences of real numbers. Put \begin{equation*}\tag{1.1}S_ u = \sum^{N_ u}_{i = 1} b_{ u i}a_{ u R_{ u i}}.\end{equation*} We shall prove that the sufficient and necessary condition for asymptotic $(N_ u \rightarrow \infty)$ normality of $S_ u$ is of Lindeberg type. This result generalizes previous results by Wald-Wolfowitz [1], Noether [3], Hoeffding [4], Dwass [6], [7] and Motoo [8]. In respect to Motoo [8] we show, in fact, that his condition, applied to our case, is not only sufficient but also necessary. Cases encountered in rank-test theory are studied in more detail in Section 6 by means of the theory of martingales. The method of this paper consists in proving asymptotic equivalency in the mean of (1.1) to a sum of infinitesimal independent components.


Journal ArticleDOI
TL;DR: In this article, the authors studied the transition matrix of an ergodic, finite Markov chain with no cyclically moving sub-classes, and obtained an asymptotic expression for the probability of tail values of the sum of the random variables in the case of integral random variables.
Abstract: Let $P = (p_{jk})$ be the transition matrix of an ergodic, finite Markov chain with no cyclically moving sub-classes For each possible transition $(j, k)$, let $H_{jk}(x)$ be a distribution function admitting a moment generating function $f_{jk}(t)$ in an interval surrounding $t = 0$ The matrix $P(t) = \{p_{jk}f_{jk}(t)\}$ is of interest in the study of the random variable $S_n = X_1 + \cdots + X_n$, where $X_m$ has the distribution $H_{jk}(x)$ if the $m$th transition takes the chain from state $j$ to state $k$ The matrix $P(t)$ is non-negative and therefore possesses a maximal positive eigenvalue $\alpha_1(t)$, which is shown to be a convex function of $t$ As an application of the convexity property, we obtain an asymptotic expression for the probability of tail values of the sum $S_n$, in the case where the $X_m$ are integral random variables The results are related to those of Blackwell and Hodges [1], whose methods are followed closely in Section 5, and Volkov [4], [5], who treats in detail the case of integer-valued functions of the state of the chain, ie, the case $f_{jk}(t) = \exp(\beta_kt) (\beta_k$ integral)



Journal ArticleDOI
TL;DR: In this article, the authors derived the distribution laws that the probabilities of misclassification follow and to obtain their expected values, where the parent populations are assumed to be normal and the discussion of the multivariate case proceeds in three stages of increasing complexity, when exact results are complicated, asymptotic results or approximations are given.
Abstract: The probabilities of misclassification involved in the use of estimated discriminant functions are subject to chance variations. The author's purpose in this paper is to derive the distribution laws that the probabilities of misclassification follow and to obtain their expected values. The parent populations are assumed to be normal. The first part of the paper considers the univariate case and the second part the multivariate case. The discussion of the multivariate case proceeds in three stages of increasing complexity. When the exact results are complicated, asymptotic results or approximations are given. Finally, the problem of estimating the expected probabilities of misclassification is considered. Interval estimates as well as point estimates are given.

Journal ArticleDOI
TL;DR: In this paper, the problem of ruin in collective risk theory is re-examined, and explicit results are obtained in the cases of negative and positive processes, and the results are then extended to the case where the total claim X(t) is a general additive process.
Abstract: 0. Summary. The theory of collective risk deals with an insurance business, for which, during a time interval (0, t) (1) the total claim X(t) has a compound Poisson distribution, and (2) the gross risk premium received is Xt. The risk reserve Z(t) = u + Xt - X(t), with the initial value Z(O) = u, is a temporally homogeneous Markov process. Starting with the initial value u, let T be the first subsequent time at which the risk reserve becomes negative, i.e., the business is "ruined". The problem of ruin in collective risk theory is concerned with the distribution of the random variable T; this distribution has not so far been obtained explicitly except in a few particular cases. In this paper, the whole problem is re-examined, and explicit results are obtained in the cases of negative and positive processes. These results are then extended to the case where the total claim X(t) is a general additive process. 1. Introduction. The theory of collective risk, as developed by the Swedish actuary Filip Lundberg, deals with the business of an insurance company. Following a series of papers published by him during the years 1909-1934, a considerable amount of work has been done by Cram6r, Segerdahl, Tacklind, Sax6n, Arfwedson and many others; a survey of the theory from the point of view of stochastic processes was given by Cram6r [2], [3] and an excellent review has recently been given by Arfwedson [1]. Briefly, the mathematical model used in this theory can be described as follows. (a) The claims occur entirely "at random", that is, during the infinitesimal interval of time (t, t + dt), the probability of a claim occurring is dt and the probability of more than one claim occurring is of a smaller order than dt, these probabilities being independent of the claims which have occurred during (0, t). (b) If a claim does occur, the amount claimed is a random variable with the probability distribution dP(x) (-oo < x < m), negative claims occurring in the case of ordinary whole-life annuities. Under the assumptions (a) and (b), it is easily seen that the total amount X(t) of all claims which occur during (0, t) has the compound Poisson distribution given by


Journal ArticleDOI
TL;DR: In this article, it was shown that the locally most powerful rank test (LMPRT) is asymptotically efficient against a linear rank statistic (LRS) under some weak restrictions.
Abstract: We are given independent random samples $X_1, \cdots, X_m$ and $Y_1, \cdots, Y_n$ from populations with unknown cumulative distribution functions (cdf's) $F_X$ and $F_Y$, respectively It is desired to test $H_o:F_X = F_Y$ against $H_1 : F_X = G_\theta,\quad F_Y = G_\phi,\quad\theta, \phi \varepsilon R,$ where $G_\theta$ is a specified family of cdf's (one for each $\theta$), $R$ is an interval on the real line, $\theta$ and $\phi$ are specified and very close to some specified value $\phi_o$, and $\theta eq \phi$ A theorem of Hoeffding is used to show that the locally most powerful rank test (LMPRT) of $H_o$ against $H_1$ is based on a linear rank statistic $T_N = (m)^{-1} \sum^N_{i = 1} a_{Ni}Z_{Ni},$ where $Z_{Ni} = 1$ when the $i$th smallest of $N = m + n$ observations is an $X$, and $Z_{Ni} = 0$, otherwise, and the $a_{Ni}$ are given numbers In a recent paper, Chernoff and Savage established the asymptotic normality of the test statistic $T_N$, subject to some weak restrictions The concept of asymptotic relative efficiency (ARE) was introduced by Pitman to compare sequences of tests It was pointed out by Chernoff and Savage that the asymptotic efficiency of a sequence of tests can be established by means of a likelihood ratio test Using this method, in conjunction with the theorem of Chernoff and Savage on asymptotic normality, it is shown that the LMPRT of $H_o$ against $H_1$ is asymptotically efficient Several applications to Cauchy, exponential, and normal populations are given

Journal ArticleDOI
TL;DR: In this paper, the authors consider the problem of determining and tabulating the distribution function of the distribution of a sample of size n drawn from a population with a specified continuous cumulative distribution function.
Abstract: Anderson and Darling proposed the use of the statistic for testing the hypothesis that a sample of size n has been drawn from a population with a specified continuous cumulative distribution function Gn(x) is the empirical distribution function defined on the sample on the size n Author consider here the problem of determining and tabulating the distribution function of this statistics.

Journal ArticleDOI
TL;DR: In this article, the authors extend Chernoff's results to infinitely many states of nature and show that a procedure is asymptotically optimal for the hypothesis testing problem when there are finitely many states in nature.
Abstract: In [2] and [3], H. Chernoff discussed the Sequential Design of Experiments. In [2], a procedure was exhibited and was proved to be asymptotically optimal for the hypothesis testing problem when there are finitely many states of nature. This paper extends Chernoff's results to infinitely many states of nature.



Journal ArticleDOI
TL;DR: In this article, the authors considered a stochastic process which satisfies the following set of assumptions: Assumption 1: X_t$ satisfies the characteristic equation, and Assumption 2: The distribution of u_t is continuous.
Abstract: Let $\{X_t, t \geq 1\}$ be a stochastic process which satisfies the following set of assumptions: ASSUMPTION 1: For every $t, X_t$ satisfies \begin{equation*}\tag{1}X_t = \alpha_1X_{t - 1} + \alpha_2X_{t - 2} + \cdots + \alpha_kX_{t - k} + u_t,\end{equation*} where $\alpha_1, \cdots, \alpha_k$ are $k$ finite real numbers (unknown parameters) and $u_t, t$ positive, are independent, identically distributed random variables with mean zero and a finite positive variance $\sigma^2$. ASSUMPTION 2: The distribution of $u_t$ is continuous. (Actually $\mathrm{Pr}\{u_t = 0\} = 0$ suffices.) ASSUMPTION 3: The roots $m_1, \cdots, m_k$ of the characteristic equation \begin{equation*}\tag{2}m^k - \alpha_1m^{k - 1} - \alpha_2m^{k - 2} - \cdots - \alpha_k = 0,\end{equation*} of (1), are distinct. ASSUMPTION 4: There is a unique root $\rho$ of (2) such that $|\rho| > 1$, and $|\rho| > \max_{j = 2, \cdots, k} |m_j|$. Here $\rho$ is identified with $m_1$ for convenience. Since complex roots enter in pairs, it follows from this assumption that $\rho$ is real. Note that there can be $m_j, j > 1$, such that $|m_j| > 1$. ASSUMPTION 5: For $t$ non-positive, $u_t = 0$. If Assumption 4 holds, the process $\{X_t, t \geqq 1\}$ is said to be (strongly) explosive, and the corresponding difference equation (1) is called an explosive (linear homogeneous) stochastic difference equation; this is the subject of the present paper. Under the assumptions above, it follows (cf., C. Jordan [5], p. 564, Mann and Wald [8], p. 178, and also the footnote on p. 22 of [10]) that $X_t = \sum^t_{r = 1}\sum^k_{q = 1} \lambda_qm^{t - r}_qu_r,$ $t$ positive, and that $\lambda_q$ satisfy the relations \begin{equation*}\tag{3}\delta_{1t} = \sum^k_{q = 1} \lambda_qm^{t - 1}_q,\quad t = 1, 0, -1, \cdots, - (k - 2),\end{equation*} where $\delta_{1t} = 1$ if $t = 1$ and 0 otherwise. (Note that $\sum^k_{q = 1}\lambda_q = 1$.) For convenience, define the random variables \begin{equation*}\tag{4}X_{i,t} = \sum^t_{r = 1} m^{t - r}_iu_r,\quad i = 1, 2, \cdots, k, (m_1 = \rho),\end{equation*} so that $X_{i,t} = 0$ for $t$ non-positive. Thus one may write $X_t$ as follows: \begin{equation*}\tag{5}X_t = \lambda_1X_{1,t} + \lambda_2X_{2,t} + \cdots + \lambda_kX_{k,t}.\end{equation*} The first part of this paper is devoted to finding a consistent estimator of $\rho$ and its limit distribution. Consequently, in Section 3 some lemmas will be proved for use in the consistency proof (Theorem I). Similarly, in Section 5, some lemmas leading to the proof of the limit distribution of the estimator (Theorem II) will be given. In the second part, the consistency of the Least Squares (L.S.) or Maximum Likelihood (M.L.) estimators of the "structural parameters" $\alpha_i$ of (1) will be considered (Theorem III). The procedure becomes much more involved because the direct application of the usual limit theorems is not possible, since the process under consideration is explosive. It is noteworthy that Lemmas 9, 10, 14-16, and Theorem I are rather general, in that they hold under the only global Assumptions 1-5 above, and the further requirement $|m_j| < 1, j = 2, \cdots, k$, so essential for the rest of the analysis of this paper, is unnecessary for them. The corresponding problem, in the case $|\rho| < 1$, has been completely solved by Mann and Wald [8]. If $k = 1$ in (1), the results of this paper reduce to those obtained by Rubin [13], White [14], and T. W. Anderson [1]. The vector case has also been treated by Anderson in [1], but a comparison of the results in this case with those of the present paper shows that they do not imply each other except in the first order. In the latter case, however, both reduce to Rubin's [13] result. The available results on stochastic difference equations are summarized in a table at the end of the paper. Some of the details and computations omitted in this paper may be found in [10]. In the following section, some known lemmas related to stochastic convergence are collected and stated in a convenient form, as they will be constantly referred to in both parts of the paper. (For proofs, see [2], [3], [4], [6] and [9].)

Journal ArticleDOI
TL;DR: In this article, a main effect plan for symmetrical factorial experiments with orthogonal arrays of strength two is presented. But the main effect plans are not orthogonality-free.
Abstract: In this paper we present a method of constructing main-effect plans for symmetrical factorial experiments which can accommodate up to $\lbrack 2(s^n - 1)/(s - 1) - 1\rbrack$ factors, each at $s = p^m$ levels, where $p$ is a prime, with $2s^n$ treatment combinations. As main-effect plans are orthogonal arrays of strength two the method presented permits the construction of the orthogonal arrays $(2s^n, 2\lbrack s^n - 1\rbrack/\lbrack 2 - 1\rbrack - 1, s, 2)$.

Journal ArticleDOI
TL;DR: In this paper, Chebyshev's inequality is extended to continuous parameter stochastic processes, by taking into account separability and letting the number of variables approach infinity, and the question of sharpness is investigated.
Abstract: 0 Summary In this paper we obtain some multivariate generalizations of Chebyshev's inequality, two of which are extended to continuous parameter stochastic processes The extensions are obtained in a natural way by taking into account separability and letting the number of variables approach infinity Particular attention is paid to the question of sharpness To show that the bound of the inequality cannot be improved, examples are given in a number of cases that attain equality 1 Introduction We begin by discussing a model for the various generalizations of Chebyshev's inequality, and for a standard proof that we shall use Examination of this proof will enable us to make some general comments concerning the problems of deriving inequalities and of proving sharpness Let (Q, B, P) be a probability space, and let ($C, (1) be a measurable space For each i c I, an arbitrary index set, let Bi C (t and let 5i be a class of random


Journal ArticleDOI
TL;DR: In this article, the authors suggest that expressing a distribution function as a mixture of suitably chosen distribution functions leads to improved methods for generating random variables in a computer, which is done in probability terms rather than in more elaborate ways of conventional numerical analysis which must be applied every time.
Abstract: This note suggests that expressing a distribution function as a mixture of suitably chosen distribution functions leads to improved methods for generating random variables in a computer. The idea is to choose a distribution function which is close to the original and use it most of the time, applying the correction only infrequently. Mixtures allow this to be done in probability terms rather than in the more elaborate ways of conventional numerical analysis, which must be applied every time.

Journal ArticleDOI
TL;DR: In this paper, the quality of a population is characterized by a real-valued parameter, and the optimum selection procedure for selecting a subset of the population is defined for distributions with monotone likelihood ratio.
Abstract: There are given a populations $\Pi_1, \cdots, \Pi_a$, of which we wish to select a subset. The quality of the $i$th population is characterized by a real-valued parameter $\theta_i$, and a population is said to be \begin{align*}\tag{1} positive \quad \text{(or} \quad good) \quad \text{if} \quad \theta_i &\geqq \theta_0 + \Delta, \\ \tag{2} negative \quad {(or} \quad bad) \quad \text{if} \quad \theta_i &\leqq \theta_0,\end{align*} where $\Delta$ is a given positive constant and $\theta_0$ is either a given number or a parameter that may be estimated. A number of optimum properties of selection procedures are defined (Section 3) and it is shown that for some of these, the optimum procedure selects $\Pi_i$ when \begin{equation*}\tag{3}T_i \leqq C_i,\end{equation*} where $T_i$ is a suitable statistic, the distribution of which depends only on $\theta_i$, and where $C$ is a suitable constant. (Sections 4 and 6.) Applications are given to distributions with monotone likelihood ratio in the case that $\theta_0$ is known (Sections 5 and 6), and to normal distributions when instead observations on $\theta_0$ are included in the experiment (Sections 10 and 11).