scispace - formally typeset
Search or ask a question

Showing papers in "Probability Theory and Related Fields in 1973"


Journal ArticleDOI
TL;DR: In this paper, the authors present a survey of recent work in mathematical statistics and probability theory with a focus on the use of robust estimators, such as weak*-continuous functionals serving as robustified maximum likelihood estimators.
Abstract: Prefatory Note This paper was written to stimulate discussion; therefore the pointed style. It was designed to be self-contained and yet to minimize overlap with Huber's (1972) long and basic survey paper, which in particular covers the technical points in more detail. The issues raised are considered basic to reasonable applications of statistics; on the other hand, they suggest and stimulate much novel research in mathematical statistics and probability theory (such as about weak*-continuous functionals serving as robustified maximum likelihood estimators, or about Choquet-capacities describing and replacing sets of probability measures). It is hoped that the paper may help in clarifying some relations between rigorous stochastic models and the world outside of mathematics, and perhaps also in improving understanding and cooperation between pure mathematicians and data analysts. 1. Why Robust Estimation ? What do those "robust estimators" intend? Should we give up our familiar and simple models, such as our beautiful analysis of variance, our powerful regression, or our high-reaching covariance matrices in multivariate statistics? The answer is no; but it may well be advantageous to modify them slightly. In fact, good practical statisticians have done such modifications all along in an informal way; we now only start to have a theory about them. Some likely advantages of such a formalization are a better intuitive insight into these modifications, improved applied methods (even routine methods, for some aspects), and the chance of having pure mathematicians contribute something to the problem. Possible disadvantages may arise along the usual transformations of a theory when it is understood less and less by more and more people. Dogmatists who insisted on the use qf "optimal" or "admissible" procedures as long as mathematical theories contained no other criteria, may now be going to insist on "optimal robust" or "admissible robust" estimation or testing. Those who habitually try to lie with statistics, rather than seek for truth, may claim even more degrees of freedom for their wicked doings. In passing, those who use statistics for sanctification rather than elucidation of uncertain facts (treating it as a replacement rather than aid for thinking) might wonder about the "monolithic, authoritarian structure" (cf. Tukey, 1962) they believe statistics to be. (Furthermore, there are of course tremendous possibilities for publishing under the fashionable flag of robustness, both of very valuable and of less valuable results, keeping needy statisticians from perishing.) Now what are the reasons for using robust procedures? There are mainly two observations which combined give an answer. Often in statistics one is using a parametric model implying a very limited set of probability distributions thought possible, such as the common model of normally distributed errors, or that of

220 citations


Journal ArticleDOI
TL;DR: In this article, a conservative onedimensional Bessel diffusion process on [0, o0] determined by the local generator is proposed. But it is not a diffusion process with index > 0.O.
Abstract: O. Introduction By a Bessel diffusion process with index ~ (~ > 0), we mean a conservative onedimensional diffusion process on [0, o0) determined by the local generator

197 citations








Journal ArticleDOI
TL;DR: In this paper, the central limit theorem is applied to show that the functional e is monotone decreasing along Boltzmann solutions of Kac's one-dimensional model of a Maxwellian gas.
Abstract: where the infimum is taken over all pairs of random variables X and Y defined on (f2, P) and distributed according to f and g respectively; here g is the Gaussian distribution with mean 0 and variance a 2 =~2 (f)e I-f] is sometimes denoted by e IX] when X is a random variable with distribution f. It should be noticed that the value of e [ f ] does not depend upon a choice of the probability space (f2, P). The purpose of this paper is to present some basic properties of e (especially, the inequality (2.2)) together with an application to the central limit theorem and then to show that the functional e is monotone decreasing along Boltzmann solutions of Kac's one-dimensional model of a Maxwellian gas. Some of our results can be generalized to the case of R 3; for example, the functional e similarly defined in R 3 decreases along solutions of Boltzmann's problem for the 3-dimensional Maxwellian gas, but this will be discussed in another occasion.

72 citations



Journal ArticleDOI
TL;DR: A channel with arbitrarily varying channel probability functions in the presence of a noiseless feedback channel is studied and its capacity is determined by proving a coding theorem and its strong converse and a formula for the zero-error capacity is obtained.
Abstract: In this article we study a channel with arbitrarily varying channel probability functions in the presence of a noiseless feedback channel (a.v.ch.f.). We determine its capacity by proving a coding theorem and its strong converse. Our proof of the coding theorem is constructive; we give explicitly a coding scheme which performs at any rate below the capacity with an arbitrarily small decoding error probability. The proof makes use of a new method ([1]) to prove the coding theorem for discrete memoryless channels with noiseless feedback (d.m.c.f.). It was emphasized in [1] that the method is not based on random coding or maximal coding ideas, and it is this fact that makes it particularly suited for proving coding theorems for certain systems of channels with noiseless feedback. As a consequence of our results we obtain a formula for the zero-error capacity of a d.m.c.f., which was conjectured by Shannon ([8], p. 19).

Journal ArticleDOI
TL;DR: In this article, it was shown that a co-analytic separable metric space is a Prohorov space if and only if it is topologically complete (consequently, it is a G~ subset of its completion).
Abstract: Let X be a Hausdorff topological space. By a measure on X we understand a tight Borel probability measure on X. The set of all measures on X is denoted by P(X); this set is a topological space with the usual topology. This topology can be described as follows: If #~ is a net in P(X) and # ~ P(X) then lim #~ = # if and only if liminf #~ (G) = # (G) for every open set G c X. For convenience a space X is called a Prohorov space if for every compact set M c P(X) and every e > 0 there exists a compact set A c X such that # (A) > 1 for each/~M. It is well known that every topologically complete space X (i. e. space which is a G~ subspace of some compact space) is a Prohorov space (see Corollary 1 of Theorem 1). Varadarajan [3] claimed to prove that a metric space X is a Prohorov space provided that every Borel measure on X is tight (consequently a separable metric space which is a Borel subset of its completion is a Prohorov space), but his proof is incorrect. An example of a K~ metric non-Prohorov space (and therefore the proof of non-validity of Varadarajan's theorem) was given by Davies [1]. In this note it is proved that a co-analytic separable metric space is a Prohorov space if and only if it is topologically complete (consequently a separable metric space which is a Borel subset of its completion is a Prohorov space if and only if it is a G~ subset of its completion). This theorem gives also a solution of the problem whether the space of rational numbers is a Prohorov space (see e.g. [1]). The reader, who is interesting only in this problem, can find its solution in part III which does not depend on topological results of part II. We begin with the following trivial lemma which will be used without special mention.

Journal ArticleDOI
TL;DR: In this paper, the authors consider the problem of finding the optimal stopping time for a continuous Markov process with a given initial distribution and semigroup, and show that the solution is markovian in the sense that the best one can expect is described by a function on the state space.
Abstract: Let E be the state space of a strong Markov process with semigroup Pt. Let f be a positive measurable function defined on E. The main problem we consider in this paper is the following optimal stopping problem: (1) Try to maximize E\"(f(Xr)), where T ranges over the stopping times for the right continuous Markov process X, which has initial distribution # and semigroup Pt(2) What can you expect to get with such an optimal stopping? It is well known that a satisfactory answer to the second question allows you to describe the optimal or the approximately optimal, stopping times as entry times in the set where the difference between what you would get by stopping immediately and the best you can still expect is small enough ([11]). Therefore we will focus our attention on the second question. It is easy to see that most optimal stopping problems in a Markov process can be reduced to this standard form. For instance, if one has to maximize E u [ f ( X r ) A (co, T)], where A is an additive functional for observation costs, it is equivalent to maximize E u [ ( f + h) (Xr)] S h (x) d/~ where h (x) = E x (A ~) ; and one is back to the standard problem with the function f + h. Similarly, if there is a multiplicative functional for discounting, it is sufficient to consider the appropriate subprocess, made Markovian by adding a point at infinity. Still more general problems can be handled by passing to space-time. This problem was solved in [11] for general stochastic processes. In the particular case of a Markov process, we must show that the solution is markovian in the sense that the best you can expect to get is described by a function on the state space, independent of the initial distribution of the process; and that there exist memoryless approximately optimal decision procedures entry times in compact sets. The main results of [11] were announced in [7], [8] and [9]; those of this paper in [10] modulo one or two obvious errors. The results of this paper bear very heavily on those of [11]. The main results of that paper which we are going to use are Theorem2 and 4; when a reference is given to [11] without further specification, it will be to one of these two theorems, or to some easy consequence of those. Accordingly, the reader should at least look at the statements of these theorems before studying the proofs of this paper.


Journal ArticleDOI
TL;DR: The Law of the Iterated Logarithm is shown to hold for sequences {an Xn} provided that the constants {an} satisfy (i) n a, 2 < C ~ a, (ii) ~ a~ --* oo.
Abstract: It Let {if,, n > 1 } be independent, identically distributed (i.i.d.) random variables with zero means and unit variances. The Law of the Iterated Logarithm is shown to hold for sequences {an Xn} provided that the constants {an} satisfy (i) n a, 2 < C ~ a~, (ii) ~ a~ --* oo. j=t j=l


Journal ArticleDOI
TL;DR: In this article, the classical Wiener-Hopf factorisation of a probability measure is extended to an operator factorisation associated with a semi-Markov transition function, and the consequences of this factorisation are indicated including a set of duality relations.
Abstract: The classical Wiener-Hopf factorisation of a probability measure is extended to an operator factorisation associated with a semi-Markov transition function. Some consequences of this factorisation are indicated including a set of duality relations.

Journal ArticleDOI
TL;DR: The present paper is a generalization of as discussed by the authors, where the authors prove Donsker's theorem for independent, not necessarily identically distributed random variables satisfying a mixing condition, in the space of continuous functions.
Abstract: According to Donsker's theorem XN D> W where W is standard Brownian motion on [0, 1]. (see [1, p. 137]). A similar theorem can be formulated in the space C of continuous functions ([1, p. 68]). Donsker's theorem has been generalized in many directions. Two of them will be taken up in the present paper. The first, due to Prohorov E9], deals with independent, not necessarily identically distributed random variables. Prohorov's theorem says that the properly defined random functions XN converge in distribution to standard Brownian motion if, and only if, the ~, satisfy the Lindeberg condition. (For the details see E2, p. 452], El, p. 77, Problem 1 and p. 143, Problem 7].) The second generalization, due to Billingsley El, p. 177] is concerned with strict sense stationary processes satisfying a mixing condition. In the present paper we shall prove Donsker's theorem for not necessarily identically distributed random variables satisfying a mixing condition. For such random variables the second-named author has proved theorems of a somewhat different character E10].

Journal ArticleDOI
TL;DR: In this paper, K61zow and Oxtoby showed that for any sequence (h,k) of elements of H, there is a subsequence h,k and an h~H such that h, k ~ h pointwise on E.
Abstract: The latter defines the usual equivalence relation in Moo. We denote by f the equivalence class of each feM ~ with respect to this equivalence relation. We say that a set F carries p ifF~ and p(E-F)=O. In the case when E is a compact space and/~ a positive Radon measure on E, we denote by Supp ~ the smallest (= intersection) of all closed sets F c E, carrying p. If f: E~R and AcE, we denote by fla the restriction of f to A; finally if HcMoo we use the notation Hla for the set of all hlA, with h~H. The author is indebted to Professors Dietrich K61zow and John C. Oxtoby for several helpful remarks and relevant comments concerning the contents of this paper. We begin with the following simple but very useful result. Proposition 1. Let H c M ~ be a set which is bounded and such that the relations h 1 ~H, h 2 ~H and h I 4: h 2 imply h 1 + h 2 . Assume that H is sequentially compact for the topology of pointwise convergence on E, that is for any sequence (h,) of elements of H, there is a subsequence (h,k) and an h~H such that h,k ~ h pointwise on E. Then H is compact and metrizable for the topology of pointwise convergence on E. Proof We consider on H the topology -[-1 of mean Ll-convergence 1, and the topology T of pointwise convergence on E. We divide the proof into two steps: I) We note first that (H, T1) is compact. In fact, for any sequence (h,) in H, there is a subsequence (h,~) and an h~H such that h,~ ~ h pointwise on E. By the Lebesgue dominated convergence theorem, we also have


Journal ArticleDOI
TL;DR: The Robbins-Monro procedure for generating the sequence {x,} is to take x I to be any constant and define Xz, x3,... in accordance with the recurrence as mentioned in this paper.
Abstract: The pioneering paper in the field of stochastic approximation was published in 1951 by Robbins and Monro [6]. That paper dealt with the following situation. Suppose that, for every point x belonging to the real line, a random variable Y(x) can be observed. The distribution function of Y(x) and the expected value of Y(x), denoted by M(x) and assumed to exist, are both unknown. Assuming that the equation M(x)=~ has a unique root, denoted by x = 0, it is desired to estimate 0 by making observations on Y at points x 1, xz, x3, ... which are generated sequentially in accordance with some definite experimental procedure in such a way that x ,~O in probability as n~oo. The Robbins-Monro procedure (RM) for generating the sequence {x,} is to take x I to be any constant and define Xz, x3, ... in accordance with the recurrence