scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Linear Statistical Inference and Its Applications

01 Aug 1966-Technometrics (Taylor & Francis Group)-Vol. 8, Iss: 3, pp 551-553
TL;DR: Rao's Linear Statistical Inference and Its Applications as discussed by the authors is one of the earliest works in statistical inference in the literature and has been translated into six major languages of the world.
Abstract: "C. R. Rao would be found in almost any statistician's list of five outstanding workers in the world of Mathematical Statistics today. His book represents a comprehensive account of the main body of results that comprise modern statistical theory." -W. G. Cochran "[C. R. Rao is] one of the pioneers who laid the foundations of statistics which grew from ad hoc origins into a firmly grounded mathematical science." -B. Efrom Translated into six major languages of the world, C. R. Rao's Linear Statistical Inference and Its Applications is one of the foremost works in statistical inference in the literature. Incorporating the important developments in the subject that have taken place in the last three decades, this paperback reprint of his classic work on statistical inference remains highly applicable to statistical analysis. Presenting the theory and techniques of statistical inference in a logically integrated and practical form, it covers: * The algebra of vectors and matrices * Probability theory, tools, and techniques * Continuous probability models * The theory of least squares and the analysis of variance * Criteria and methods of estimation * Large sample theory and methods * The theory of statistical inference * Multivariate normal distribution Written for the student and professional with a basic knowledge of statistics, this practical paperback edition gives this industry standard new life as a key resource for practicing statisticians and statisticians-in-training.
Citations
More filters
Journal ArticleDOI
TL;DR: The Condensation algorithm uses “factored sampling”, previously applied to the interpretation of static images, in which the probability distribution of possible interpretations is represented by a randomly generated set.
Abstract: The problem of tracking curves in dense visual clutter is challenging. Kalman filtering is inadequate because it is based on Gaussian densities which, being unimo dal, cannot represent simultaneous alternative hypotheses. The Condensation algorithm uses “factored sampling”, previously applied to the interpretation of static images, in which the probability distribution of possible interpretations is represented by a randomly generated set. Condensation uses learned dynamical models, together with visual observations, to propagate the random set over time. The result is highly robust tracking of agile motion. Notwithstanding the use of stochastic methods, the algorithm runs in near real-time.

5,804 citations


Cites methods from "Linear Statistical Inference and It..."

  • ...They use the standard probabilistic tool of “weak convergence” (Rao, 1973) and the “weak law of large numbers” to show that a posterior distribution inferred by factored sampling can be made arbitrarily accurate by choosing N sufficiently large....

    [...]

Posted Content
TL;DR: In this paper, the authors propose a model in which there is an equilibrium degree of disequilibrium: prices reflect the information of informed individuals (arbitrageurs) but only partially, so that those who expend resources to obtain information do receive compensation.
Abstract: If competitive equilibrium is defined as a situation in which prices are such that all arbitrage profits are eliminated, is it possible that a competitive economy always be in equilibrium? Clearly not, for then those who arbitrage make no (private) return from their (privately) costly activity. Hence the assumptions that all markets, including that for information, are always in equilibrium and always perfectly arbitraged are inconsistent when arbitrage is costly. We propose here a model in which there is an equilibrium degree of disequilibrium: prices reflect the information of informed individuals (arbitrageurs) but only partially, so that those who expend resources to obtain information do receive compensation. How informative the price system is depends on the number of individuals who are informed; but the number of individuals who are informed is itself an endogenous variable in the model. The model is the simplest one in which prices perform a well-articulated role in conveying information from the informed to the uninformed. When informed individuals observe information that the return to a security is going to be high, they bid its price up, and conversely when they observe information that the return is going to be low. Thus the price system makes publicly available the information obtained by informed individuals to the uninformed. In general, however, it does this imperfectly; this is perhaps lucky, for were it to do it perfectly , an equilibrium would not exist. In the introduction, we shall discuss the general methodology and present some conjectures concerning certain properties of the equilibrium. The remaining analytic sections of the paper are devoted to analyzing in detail an important example of our general model, in which our conjectures concerning the nature of the equilibrium can be shown to be correct. We conclude with a discussion of the implications of our approach and results, with particular emphasis on the relationship of our results to the literature on "efficient capital markets."

5,740 citations

Journal ArticleDOI
TL;DR: In this article, the authors propose simple and directional likelihood-ratio tests for discriminating and choosing between two competing models whether the models are nonnested, overlapping or nested and whether both, one, or neither is misspecified.
Abstract: In this paper, we propose a classical approach to model selection. Using the Kullback-Leibler Information measure, we propose simple and directional likelihood-ratio tests for discriminating and choosing between two competing models whether the models are nonnested, overlapping or nested and whether both, one, or neither is misspecified. As a prerequisite, we fully characterize the asymptotic distribution of the likelihood ratio statistic under the most general conditions.

5,661 citations


Cites background or methods from "Linear Statistical Inference and It..."

  • ...THE MAIN PURPOSE OF THIS PAPER is to propose sonme new tests for model selection and non-nested hypotheses. Since all our tests are based on the likelihood ratio principle, as a prerequisite, we shall completely characterize the asymptotic distribution of the likelihood ratio statistic under general conditions. By general conditions we mean that the models may be nested, non-nested, or overlapping, and that both, only one, or neither of the competing models may contain the true law generating the observations. Unlike most previous work on model selection (see, e.g., Chow (1983, Ch. 9), Judge et al. (1985, Ch. 21)), we adopt the classical hypothesis testing framework and propose some directional and symmetric tests for choosing between models. This approach, which has not attracted a lot of attention, dates back to Hotelling (1940). See also Chow (1980). A notable and recent exception is White and Olson (1979) where competing models are evaluated according to their mean-square error of prediction. In this paper, we follow Akaike (1973, 1974) and consider the Kullback-Leibler (1951) Information Criterion (KLIC) which measures the distance between a given distribution and the true distribution....

    [...]

  • ...the KLIC over the distributions in the model, then it is natural to define the "best" model among a collection of competing models to be the model that is closest to the true distribution (see also Sawa (1978, Rule 2.1)). We consider conditional models so as to allow for explanatory variables. Then, if Fe = { f( y I z; 0); 0 E (9} is a conditional model, its distance from the true conditional density ho(ylz), as measured by the minimum KLIC, is E?[loghO(yJz)] - E?[logf(ylz; 0*)] where E?[.] denotes the expectation with respect to the true joint distribution of (y, z) and O* is the pseudo-true value of 0 (see, e.g., Sawa (1978), White (1982a)). Thus, an equivalent selection criterion can be based on the quantity E0[logf(ylz; O*)], the "best" model being the one for which this quantity is the largest. Given two conditional models Fe and G= {g(ylz; y); y E F}, which may be nested, non-nested, or overlapping, we propose tests of the null hypothesis that E 0[log f (y I z; O*)] = E?[log g(y I z; y*)] meaning that the two models are equivalent, against E0[logf(yIz; O*)] > E0[log g(yIz; y*)] meaning that F6 is better than Gy or against E0[logf(ylz; O*)] <E0[logg(ylz; y*)] meaning that GY is better than F6. Tests of such hypotheses are called tests for model selection. Since the true density ho(ylz) is not restricted a priori to belong to either one of the models F6 and Gy, by necessity, the concern of this paper is with asymptotic results. The quantity E0[log f(y Iz; O*)] is unknown. It can nevertheless be consistently estimated, under some regularity conditions, by (1/n) times the log-likelihood evaluated at the pseudo or quasi maximum likelihood estimator (MLE) (see, e.g., White (1982a), Gourieroux, Monfort, and Trognon (1984)). Hence (1/n) times the log-likelihood ratio (LR) statistic is a consistent estimator of the quantity E0[log f(yIz; O*)] - E0[log g(yIz; y*)]. Given the above definition of a "best" model, it is natural to consider the LR statistic as a basis for constructing tests for model selection. Since the two competing models may be nested, non-nested, or overlapping, and since both, only one, or neither of the two models may be correctly specified, it is necessary to obtain the asymptotic distribution of the LR statistic under the most general conditions. To do so, we use the framework of White (1982a) in order to handle the possibly misspecified case. Since Neyman and Pearson (1928) advocated the LR test, it has become one of the most popular methods for testing restrictions on the parameters of a statistical model....

    [...]

  • ...the KLIC over the distributions in the model, then it is natural to define the "best" model among a collection of competing models to be the model that is closest to the true distribution (see also Sawa (1978, Rule 2.1)). We consider conditional models so as to allow for explanatory variables. Then, if Fe = { f( y I z; 0); 0 E (9} is a conditional model, its distance from the true conditional density ho(ylz), as measured by the minimum KLIC, is E?[loghO(yJz)] - E?[logf(ylz; 0*)] where E?[.] denotes the expectation with respect to the true joint distribution of (y, z) and O* is the pseudo-true value of 0 (see, e.g., Sawa (1978), White (1982a)). Thus, an equivalent selection criterion can be based on the quantity E0[logf(ylz; O*)], the "best" model being the one for which this quantity is the largest. Given two conditional models Fe and G= {g(ylz; y); y E F}, which may be nested, non-nested, or overlapping, we propose tests of the null hypothesis that E 0[log f (y I z; O*)] = E?[log g(y I z; y*)] meaning that the two models are equivalent, against E0[logf(yIz; O*)] > E0[log g(yIz; y*)] meaning that F6 is better than Gy or against E0[logf(ylz; O*)] <E0[logg(ylz; y*)] meaning that GY is better than F6. Tests of such hypotheses are called tests for model selection. Since the true density ho(ylz) is not restricted a priori to belong to either one of the models F6 and Gy, by necessity, the concern of this paper is with asymptotic results. The quantity E0[log f(y Iz; O*)] is unknown. It can nevertheless be consistently estimated, under some regularity conditions, by (1/n) times the log-likelihood evaluated at the pseudo or quasi maximum likelihood estimator (MLE) (see, e.g., White (1982a), Gourieroux, Monfort, and Trognon (1984)). Hence (1/n) times the log-likelihood ratio (LR) statistic is a consistent estimator of the quantity E0[log f(yIz; O*)] - E0[log g(yIz; y*)]. Given the above definition of a "best" model, it is natural to consider the LR statistic as a basis for constructing tests for model selection. Since the two competing models may be nested, non-nested, or overlapping, and since both, only one, or neither of the two models may be correctly specified, it is necessary to obtain the asymptotic distribution of the LR statistic under the most general conditions. To do so, we use the framework of White (1982a) in order to handle the possibly misspecified case....

    [...]

  • ...THE MAIN PURPOSE OF THIS PAPER is to propose sonme new tests for model selection and non-nested hypotheses. Since all our tests are based on the likelihood ratio principle, as a prerequisite, we shall completely characterize the asymptotic distribution of the likelihood ratio statistic under general conditions. By general conditions we mean that the models may be nested, non-nested, or overlapping, and that both, only one, or neither of the competing models may contain the true law generating the observations. Unlike most previous work on model selection (see, e.g., Chow (1983, Ch. 9), Judge et al. (1985, Ch. 21)), we adopt the classical hypothesis testing framework and propose some directional and symmetric tests for choosing between models. This approach, which has not attracted a lot of attention, dates back to Hotelling (1940). See also Chow (1980). A notable and recent exception is White and Olson (1979) where competing models are evaluated according to their mean-square error of prediction....

    [...]

  • ...the KLIC over the distributions in the model, then it is natural to define the "best" model among a collection of competing models to be the model that is closest to the true distribution (see also Sawa (1978, Rule 2.1)). We consider conditional models so as to allow for explanatory variables. Then, if Fe = { f( y I z; 0); 0 E (9} is a conditional model, its distance from the true conditional density ho(ylz), as measured by the minimum KLIC, is E?[loghO(yJz)] - E?[logf(ylz; 0*)] where E?[.] denotes the expectation with respect to the true joint distribution of (y, z) and O* is the pseudo-true value of 0 (see, e.g., Sawa (1978), White (1982a))....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the EM algorithm converges to a local maximum or a stationary value of the (incomplete-data) likelihood function under conditions that are applicable to many practical situations.
Abstract: Two convergence aspects of the EM algorithm are studied: (i) does the EM algorithm find a local maximum or a stationary value of the (incomplete-data) likelihood function? (ii) does the sequence of parameter estimates generated by EM converge? Several convergence results are obtained under conditions that are applicable to many practical situations Two useful special cases are: (a) if the unobserved complete-data specification can be described by a curved exponential family with compact parameter space, all the limit points of any EM sequence are stationary points of the likelihood function; (b) if the likelihood function is unimodal and a certain differentiability condition is satisfied, then any EM sequence converges to the unique maximum likelihood estimate A list of key properties of the algorithm is included

3,414 citations