scispace - formally typeset
Search or ask a question

Showing papers on "Likelihood principle published in 1978"


Journal ArticleDOI
TL;DR: In this article, the authors give a frequentist justification for preferring the variance estimate I/I(x) to 1/Io, where I is the observed information, i.e. minus the second derivative of the log likelihood function at # given data x.
Abstract: SUMMARY This paper concerns normal approximations to the distribution of the maximum likelihood estimator in one-parameter families. The traditional variance approximation is 1/1.I, where 0 is the maximum likelihood estimator and fo is the expected total Fisher information. Many writers, including R. A. Fisher, have argued in favour of the variance estimate I/I(x), where I(x) is the observed information, i.e. minus the second derivative of the log likelihood function at # given data x. We give a frequentist justification for preferring I/I(x) to 1/.Io. The former is shown to approximate the conditional variance of # given an appropriate ancillary statistic which to a first approximation is I(x). The theory may be seen to flow naturally from Fisher's pioneering papers on likelihood estimation. A large number of examples are used to supplement a small amount of theory. Our evidence indicates preference for the likelihood ratio method of obtaining confidence limits.

864 citations


Journal ArticleDOI
TL;DR: The small sample properties of three goodness-of-fit statistics for the analysis of categorical data are examined with respect to the adequacy of the asymptotic chi-squared approximation.
Abstract: The small-sample properties of three goodness-of-fit statistics for the analysis of categorical data are examined with respect to the adequacy of the asymptotic chi-squared approximation. The approximate tests based on the likelihood ratio and Freeman-Tukey statistics yield exact levels that are typically in excess of the nominal levels for moderate expected values. In contrast, the Pearson statistic attains exact levels that are quite close to the nominal values. The reason for the large number of rejections for the likelihood ratio and Freeman-Tukey statistics is related to their handling of small observed counts.

290 citations


Journal ArticleDOI
TL;DR: In this article, rank tests and permutation tests in the regression problem are shown to be directly related to score function tests based on marginal and conditional likelihoods, respectively, and the problem of estimating a population percentile is also examined from the viewpoint of marginal likelihood.
Abstract: Standard techniques of nonparametric statistics are related to likelihood procedures. In particular, rank tests and permutation tests in the regression problem are shown to be directly related to score function tests based on marginal and conditional likelihoods, respectively. The problem of estimating a population percentile is also examined from the viewpoint of marginal likelihood. These calculations tend to bring nonparametric procedures more closely in line with procedures adopted in parametric inference.

76 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed three methods for computing the exact likelihood function of multivariate moving average models, each method utilizes the structure of the covariance matrix in a different way.
Abstract: SUMMARY This paper proposes three methods for computing the exact likelihood function of multivariate moving average models. Each method utilizes the structure of the covariance matrix in a different way. Formulae for operation counts of the three algorithms are given as a guide in selecting the best method for a given problem. Monte Carlo simulations are performed to compare the mean squared errors of parameter estimates obtained by maximizing the exact likelihood function versus those obtained by maximizing various approximate forms of the likelihood function.

57 citations


Journal ArticleDOI
TL;DR: In this article, the covariance matrix is the inverse of the observed log likelihood second derivative matrix at the likelihood maximum, as opposed to the inverse inverse of Fisher information, to which it is superior.
Abstract: SUMMARY We obtain a normal approximation to the conditional distribution of maximum likelihood estimates in location-scale families. The covariance matrix is the inverse of minus the observed log likelihood second derivative matrix at the likelihood maximum, as opposed to the inverse of Fisher information, to which it is superior. Confidence intervals for the location parameter are discussed. Small sample numerical results are given for the Cauchy case to illustrate the accuracy of the approximations. Some key word8: Ancillary statistic; Cauchy distribution; Conditional inference; Confidence interval; Fisher information; Likelihood ratio statistic; Linear model; Observed information; Pivotal statistic.

47 citations


Journal ArticleDOI
TL;DR: In this paper, an appropriate transform on the parameters is performed in order to satisfy this constraint, and the estimation of the transformed parameters according to the maximum likelihood principle is outlined, and a numerical example is given for which the basis solution and the usual maximum likelihood method failed.
Abstract: As the literature indicates, no method is presently available which takes explicitly into account that the parameters of Lazarsfeld's latent class analysis are defined as probabilities and are therefore restricted to the interval [0, 1]. In the present paper an appropriate transform on the parameters is performed in order to satisfy this constraint, and the estimation of the transformed parameters according to the maximum likelihood principle is outlined. In the sequel, a numerical example is given for which the basis solution and the usual maximum likelihood method failed. The different results are compared and the advantages of the proposed method discussed.

24 citations


Journal ArticleDOI
TL;DR: In this paper, the problem of finding a suitable (asymptotic) efficiency criterion for inference concerning parameters of stochastic processes is addressed, and a contiguity calculation is used to show that a previously suggested criterion is inadequate and itself provides a partial solution to the problem.

18 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that Neyman and Pearson considered a Bayesian approach and reasoning based directly on likelihoods, and deliberately excluded both from their theory giving reasons for this exclusion and the procedure Spielman would employ in any of the examples he discussed or cited.
Abstract: 4 Likelihoods. 5 Summary. I INTRODUCTION Spielman [1973] presents 'A Refutation of the Neyman-Pearson Theory of Testing'. He states at the outset [p. 202],1 'I intend to show that NPT is inadequate on its own terms'. We have had great difficulty in following his argument because he never quite states exactly what Neyman-Pearson Theory (NPT) is to him, and because his objections are not relevant to our conception of NPT. As we understand Spielman's criticism, it amounts to saying that NPT is inadequate because it is neither Bayesian nor reasoning based on the likelihood principle. We find this roughly equivalent to saying he is going to criticise Hebrew theology on its own terms, and then basing his entire argument on the New Testament. We hope to establish below that (i) Neyman and Pearson considered a Bayesian approach and reasoning based directly on likelihoods, and deliberately excluded both from their theory giving reasons for this exclusion and (2) the procedure Spielman would employ in any of the examples he discussed or cited--given the constraints of the problem-is exactly the one that Neyman and Pearson would recommend. There are differences between Neyman-Pearson and Bayesian Theory, but these differences are not relevant to the examples and the issues Spielman discusses.

16 citations


Journal ArticleDOI
TL;DR: In this paper, a method of marginal likelihood is presented for the elimination of nuisance parameters from a model which represents a wide class of single equation distributed lag models, which relies on the ability to divide the information on the nuisance parameters given by the data into sufficient and ancillary statistics for the nuisance parameter.
Abstract: The method of marginal likelihood is presented for the elimination of nuisance parameters from a model which represents a wide class of single equation distributed lag models. The technique relies on the ability to divide the information on the nuisance parameters given by the data into sufficient and ancillary statistics for the nuisance parameters. The marginal likelihood for the parameters of interest is obtained through the marginal distribution of the ancillary statistics. The marginal likelihood that is obtained is compared to the concentrated likelihood function obtained by substituting the solution of the likelihood equations of the nuisance parameters for their parameter values in the full likelihood. Three special cases of the general model are considered as examples: models with structural disturbances, lagged variables models, and a polynomial distributed lag model.

9 citations



Journal ArticleDOI
TL;DR: In this paper, the asymptotic distribution of the likelihood ratio statistic is derived without the condition that the model chosen to construct the test statistic be correct, and the model is said to be correct if it contains the true distribution of observations.
Abstract: Classical results on the asymptotic distribution of the likelihood ratio statistic rely on the assumption that the model chosen to construct the test statistic be correct. The model is said to be correct if it contains the true distribution of the observations. In this paper the asymptotic distribution of the likelihood ratio statistic is derived without the condition that the model need be correct.

Journal ArticleDOI
TL;DR: A response to the recent discussions critical of the Bayesian learning procedure on the basis of empirically observed deviations from its prescriptions and examples of surprising learning behaviours and decision strategies are generated.
Abstract: A response is made to the recent discussions critical of the Bayesian learning procedure on the basis of empirically observed deviations from its prescriptions. Bayes' theorem is embedded in a more general class of learning rules which allow for departure from the demands of idealized rational behaviour. Such departures are termed learning impediments or disabilities. Some particular forms and interpretations of impediment functions are presented. Consequences of learning disabilities for the likelihood principle, stable estimation and admissible decision-making are explored. Examples of surprising learning behaviours and decision strategies are generated. Deeper understanding of Bayesian learning and its characteristics results.

Proceedings ArticleDOI
Fang-kuo Sun1, T. Lee1
01 Jan 1978
TL;DR: Maximum likelihood estimates of the mean and the covariance of a normal random variable, based on a set of independently, but nonidentically distributed observations, are discussed and an efficient algorithm for computing MLEs is introduced.
Abstract: In this paper, maximum likelihood estimates of the mean and the covariance of a normal random variable, based on a set of independently, but nonidentically distributed observations, are discussed. An efficient algorithm for computing MLEs is introduced. The asymptotic properties such as strong consistency and asymptotic normality are examined.

Journal ArticleDOI
TL;DR: In this article, the exact distribution of the X2 index of dispersion and -2 log (likelihood ratio) tests for the hypothesis of homogeneity of c independent samples from a common binomial population was considered.
Abstract: This paper considers the exact distribution of the X2 index of dispersion and -2 log (likelihood ratio) tests for the hypothesis of homogeneity of c independent samples from a common binomial population. The exact significance levels and power of these tests under ‘logit’ alternatives are compared numerically for the cases: c = 3, 4, 5 and various sample sizes. n1 = 5,10 for i = 1,…, c.


Journal ArticleDOI
TL;DR: In this paper, a Monte Carlo study showed that one of these estimators is less absolutely biased and has smaller variance than the exact maximum likelihood estimator, and this estimator exhibits greater power in tests of autocorrelation when used in place of the standard Durbin-Watson statistic.
Abstract: Two approximations to the maximum likelihood estimiator of the autocorrelation coefficient in first-order Markov disturbances in a linear model have been previously suggested by Durbin & Watson. A Monte Carlo study shows that one of these estimators is less absolutely biased and has smaller variance than the exact maximum likelihood estimator. Further, this estimator exhibits greater power in tests of autocorrelation when used in place of the standard Durbin-Watson statistic. The second estimator, while having very small variance, is severely biased and of little use in power considerations.


Journal ArticleDOI
TL;DR: In this paper, an L 1 type inequality is proved between likelihood ratios and approximations to them obtained by conditioning on a finite number of random variables, which is known as the L1 type inequality.

Journal ArticleDOI
01 Jan 1978
TL;DR: In this paper, it was shown that, correctly applied, use of the likelihood function does not lead to any such result, and that the support of a composite hypothesis can be adequately assessed only by averaging the likelihoods of its constituent simple hypotheses.
Abstract: Allan Birnbaum has alleged that use of a likelihood criterion can find strong evidence against a true hypothesis with probability one It is shown that, correctly applied, use of the likelihood function does not lead to any such result Specifically, Birnbaum's example involves composite hypotheses, and, from a Bayesian point of view, the support of a composite hypothesis can be adequately assessed only by averaging the likelihoods of its constituent simple hypotheses

Journal ArticleDOI
TL;DR: The use of conditional confidence intervals in estimating Weibull parameters was introduced by Lawless as mentioned in this paper, who argued that there are philosophical differences which in other cases may lead to large numerical differences, and that such differences can be downplayed because the evidence thus far, as given in the paper and its references, is that numerical results are negligibly different between the two approaches.
Abstract: The purpose of the paper by Dr. Lawless is to popularize the use of conditional confidence intervals in estimating Weibull parameters. The approach is soft-sell, emphasizing the computational and freedom-from-tables aspects, and avoiding the philosophical arguments which might be leveled at competing methods, which (horrors!) are based on less than sufficient statistics, don't satisfy the Likelihood Principle, and may be guilty of other sins. Philosophical differences can be downplayed because the evidence thus far, as given in the paper and its references [3] and [17], is that numerical results are negligibly different between the two approaches. Thus one can think (and is encouraged to do so by Lawless) of his approach as a computationally convenient way of working problems beyond the range of published tables and ignore the fact that there are philosophical differences which in other (than Weibull) cases may lead to large numerical differences. This might be regarded as a weak position, but it is more apt to be successful in its purpose than if the paper were titled "Bayes-Fiducial Estimation . . ." and written for the purpose of converting the Technometrics readership to Bayesian or fiducial thinking. The paper is well written and it is a pleasure to have been a reviewer of it. Both referees of the original draft of the paper would not have objected if the paper had been published as is. Not wanting to establish such a precedent, as Associate Editor I suggested some modifications, most of which were incorporated into the paper. Thus what follows should be regarded as an elaboration of some of the ideas in the paper and not as criticism. The discussion will be primarily in terms of the Weibull distribution, though of course translation to the extreme value distribution is straightforward.