# Linear Statistical Inference and Its Applications

##### Citations

5,804 citations

### Cites methods from "Linear Statistical Inference and It..."

...They use the standard probabilistic tool of “weak convergence” (Rao, 1973) and the “weak law of large numbers” to show that a posterior distribution inferred by factored sampling can be made arbitrarily accurate by choosing N sufficiently large....

[...]

5,740 citations

5,661 citations

### Cites background or methods from "Linear Statistical Inference and It..."

...THE MAIN PURPOSE OF THIS PAPER is to propose sonme new tests for model selection and non-nested hypotheses. Since all our tests are based on the likelihood ratio principle, as a prerequisite, we shall completely characterize the asymptotic distribution of the likelihood ratio statistic under general conditions. By general conditions we mean that the models may be nested, non-nested, or overlapping, and that both, only one, or neither of the competing models may contain the true law generating the observations. Unlike most previous work on model selection (see, e.g., Chow (1983, Ch. 9), Judge et al. (1985, Ch. 21)), we adopt the classical hypothesis testing framework and propose some directional and symmetric tests for choosing between models. This approach, which has not attracted a lot of attention, dates back to Hotelling (1940). See also Chow (1980). A notable and recent exception is White and Olson (1979) where competing models are evaluated according to their mean-square error of prediction. In this paper, we follow Akaike (1973, 1974) and consider the Kullback-Leibler (1951) Information Criterion (KLIC) which measures the distance between a given distribution and the true distribution....

[...]

...the KLIC over the distributions in the model, then it is natural to define the "best" model among a collection of competing models to be the model that is closest to the true distribution (see also Sawa (1978, Rule 2.1)). We consider conditional models so as to allow for explanatory variables. Then, if Fe = { f( y I z; 0); 0 E (9} is a conditional model, its distance from the true conditional density ho(ylz), as measured by the minimum KLIC, is E?[loghO(yJz)] - E?[logf(ylz; 0*)] where E?[.] denotes the expectation with respect to the true joint distribution of (y, z) and O* is the pseudo-true value of 0 (see, e.g., Sawa (1978), White (1982a)). Thus, an equivalent selection criterion can be based on the quantity E0[logf(ylz; O*)], the "best" model being the one for which this quantity is the largest. Given two conditional models Fe and G= {g(ylz; y); y E F}, which may be nested, non-nested, or overlapping, we propose tests of the null hypothesis that E 0[log f (y I z; O*)] = E?[log g(y I z; y*)] meaning that the two models are equivalent, against E0[logf(yIz; O*)] > E0[log g(yIz; y*)] meaning that F6 is better than Gy or against E0[logf(ylz; O*)] <E0[logg(ylz; y*)] meaning that GY is better than F6. Tests of such hypotheses are called tests for model selection. Since the true density ho(ylz) is not restricted a priori to belong to either one of the models F6 and Gy, by necessity, the concern of this paper is with asymptotic results. The quantity E0[log f(y Iz; O*)] is unknown. It can nevertheless be consistently estimated, under some regularity conditions, by (1/n) times the log-likelihood evaluated at the pseudo or quasi maximum likelihood estimator (MLE) (see, e.g., White (1982a), Gourieroux, Monfort, and Trognon (1984)). Hence (1/n) times the log-likelihood ratio (LR) statistic is a consistent estimator of the quantity E0[log f(yIz; O*)] - E0[log g(yIz; y*)]. Given the above definition of a "best" model, it is natural to consider the LR statistic as a basis for constructing tests for model selection. Since the two competing models may be nested, non-nested, or overlapping, and since both, only one, or neither of the two models may be correctly specified, it is necessary to obtain the asymptotic distribution of the LR statistic under the most general conditions. To do so, we use the framework of White (1982a) in order to handle the possibly misspecified case. Since Neyman and Pearson (1928) advocated the LR test, it has become one of the most popular methods for testing restrictions on the parameters of a statistical model....

[...]

...the KLIC over the distributions in the model, then it is natural to define the "best" model among a collection of competing models to be the model that is closest to the true distribution (see also Sawa (1978, Rule 2.1)). We consider conditional models so as to allow for explanatory variables. Then, if Fe = { f( y I z; 0); 0 E (9} is a conditional model, its distance from the true conditional density ho(ylz), as measured by the minimum KLIC, is E?[loghO(yJz)] - E?[logf(ylz; 0*)] where E?[.] denotes the expectation with respect to the true joint distribution of (y, z) and O* is the pseudo-true value of 0 (see, e.g., Sawa (1978), White (1982a)). Thus, an equivalent selection criterion can be based on the quantity E0[logf(ylz; O*)], the "best" model being the one for which this quantity is the largest. Given two conditional models Fe and G= {g(ylz; y); y E F}, which may be nested, non-nested, or overlapping, we propose tests of the null hypothesis that E 0[log f (y I z; O*)] = E?[log g(y I z; y*)] meaning that the two models are equivalent, against E0[logf(yIz; O*)] > E0[log g(yIz; y*)] meaning that F6 is better than Gy or against E0[logf(ylz; O*)] <E0[logg(ylz; y*)] meaning that GY is better than F6. Tests of such hypotheses are called tests for model selection. Since the true density ho(ylz) is not restricted a priori to belong to either one of the models F6 and Gy, by necessity, the concern of this paper is with asymptotic results. The quantity E0[log f(y Iz; O*)] is unknown. It can nevertheless be consistently estimated, under some regularity conditions, by (1/n) times the log-likelihood evaluated at the pseudo or quasi maximum likelihood estimator (MLE) (see, e.g., White (1982a), Gourieroux, Monfort, and Trognon (1984)). Hence (1/n) times the log-likelihood ratio (LR) statistic is a consistent estimator of the quantity E0[log f(yIz; O*)] - E0[log g(yIz; y*)]. Given the above definition of a "best" model, it is natural to consider the LR statistic as a basis for constructing tests for model selection. Since the two competing models may be nested, non-nested, or overlapping, and since both, only one, or neither of the two models may be correctly specified, it is necessary to obtain the asymptotic distribution of the LR statistic under the most general conditions. To do so, we use the framework of White (1982a) in order to handle the possibly misspecified case....

[...]

...THE MAIN PURPOSE OF THIS PAPER is to propose sonme new tests for model selection and non-nested hypotheses. Since all our tests are based on the likelihood ratio principle, as a prerequisite, we shall completely characterize the asymptotic distribution of the likelihood ratio statistic under general conditions. By general conditions we mean that the models may be nested, non-nested, or overlapping, and that both, only one, or neither of the competing models may contain the true law generating the observations. Unlike most previous work on model selection (see, e.g., Chow (1983, Ch. 9), Judge et al. (1985, Ch. 21)), we adopt the classical hypothesis testing framework and propose some directional and symmetric tests for choosing between models. This approach, which has not attracted a lot of attention, dates back to Hotelling (1940). See also Chow (1980). A notable and recent exception is White and Olson (1979) where competing models are evaluated according to their mean-square error of prediction....

[...]

...the KLIC over the distributions in the model, then it is natural to define the "best" model among a collection of competing models to be the model that is closest to the true distribution (see also Sawa (1978, Rule 2.1)). We consider conditional models so as to allow for explanatory variables. Then, if Fe = { f( y I z; 0); 0 E (9} is a conditional model, its distance from the true conditional density ho(ylz), as measured by the minimum KLIC, is E?[loghO(yJz)] - E?[logf(ylz; 0*)] where E?[.] denotes the expectation with respect to the true joint distribution of (y, z) and O* is the pseudo-true value of 0 (see, e.g., Sawa (1978), White (1982a))....

[...]

3,414 citations