scispace - formally typeset
Search or ask a question

Showing papers in "Biometrika in 1978"


Journal ArticleDOI
TL;DR: In this paper, the overall test for lack of fit in autoregressive-moving average models proposed by Box & Pierce (1970) is considered, and it is shown that a substantially improved approximation results from a simple modification of this test.
Abstract: SUMMARY The overall test for lack of fit in autoregressive-moving average models proposed by Box & Pierce (1970) is considered. It is shown that a substantially improved approximation results from a simple modification of this test. Some consideration is given to the power of such tests and their robustness when the innovations are nonnormal. Similar modifications in the overall tests used for transfer function-noise models are proposed.

6,008 citations


Journal ArticleDOI
TL;DR: In this article, a related model for association in bivariate survivorship time distributions is proposed for the analysis of familial tendency in disease incidence, which is related to more specifically epidemiological models.
Abstract: SUMMARY The application of Cox's (1972) regression model for censored survival data to epidemiological studies of chronic disease incidence is discussed. A related model for association in bivariate survivorship time distributions is proposed for the analysis of familial tendency in disease incidence. The possible extension of the model to general multivariate survivorship distributions is indicated. This paper is concerned with a problem in the analysis of epidemiological studies of chronic disease incidence. In contrast with problems in the epidemiology of infectious disease, such analysis usually assumes that incidence of disease in different individuals represents independent events, the occurrence of which is influenced by measurable factors describing individuals and their environment. However, in the study of familial tendency in chronic disease incidence, this assumption is called into question. Comparisons of parents and offspring and sibling comparisons investigate possible relationships between disease incidence in related individuals and such studies provide interesting analytical difficulties. Here, this problem is treated as one of estimating association in multivariate life tables. In ? 2 it is shown that epidemiological incidence studies may be regarded as being concerned primarily with the study of the distribution of the age at incidence and that Cox's (1972) regression model for the analysis of censored survival time data may readily be applied to incidence data and is closely related to more specifically epidemiological models. In later sections, a related model for bivariate life tables is developed and applied to the problem of demonstrating association in disease incidence in ordered pairs of individuals. The possible extension of the model into more dimensions is indicated.

2,013 citations


Journal ArticleDOI
TL;DR: In this article, the authors give a frequentist justification for preferring the variance estimate I/I(x) to 1/Io, where I is the observed information, i.e. minus the second derivative of the log likelihood function at # given data x.
Abstract: SUMMARY This paper concerns normal approximations to the distribution of the maximum likelihood estimator in one-parameter families. The traditional variance approximation is 1/1.I, where 0 is the maximum likelihood estimator and fo is the expected total Fisher information. Many writers, including R. A. Fisher, have argued in favour of the variance estimate I/I(x), where I(x) is the observed information, i.e. minus the second derivative of the log likelihood function at # given data x. We give a frequentist justification for preferring I/I(x) to 1/.Io. The former is shown to approximate the conditional variance of # given an appropriate ancillary statistic which to a first approximation is I(x). The theory may be seen to flow naturally from Fisher's pioneering papers on likelihood estimation. A large number of examples are used to supplement a small amount of theory. Our evidence indicates preference for the likelihood ratio method of obtaining confidence limits.

864 citations


Journal ArticleDOI
TL;DR: In this article, a model which allows capture probabilities to vary by individuals is introduced for multiple recapture studies on closed populations, where the set of individual capture probabilities is modelled as a random sample from an arbitrary probability distribution over the unit interval.
Abstract: SUMMARY A model which allows capture probabilities to vary by individuals is introduced for multiple recapture studies on closed populations. The set of individual capture probabilities is modelled as a random sample from an arbitrary probability distribution over the unit interval. We show that the capture frequencies are a sufficient statistic. A nonparametric estimator of population size is developed based on the generalized jackknife; this estimator is found to be a linear combination of the capture frequencies. Finally, tests of underlying assumptions are presented.

770 citations


Journal ArticleDOI
TL;DR: In this article, linear rank statistics are developed for tests on regression coefficients with censored data, which arise as score statistics based on the marginal probability of a generalized rank vector, and the observed Fisher information provides a variance estimator generally, while in certain special cases a permutation approach to variance estimation is also possible.
Abstract: SUMMARY Linear rank statistics are developed for tests on regression coefficients with censored data These statistics arise as score statistics based on the marginal probability of a generalized rank vector The observed Fisher information provides a variance estimator generally, while in certain special cases a permutation approach to variance estimation is also possible The

708 citations


Journal ArticleDOI
TL;DR: The proportional hazards failure time model of Cox (1972) is adapted to the retrospective epidemiological study, in which cases and controls are sampled from a population in order to relate disease incidence rates to exposure to suspected risk factors as discussed by the authors.
Abstract: SUMMARY The proportional hazards failure time model of Cox (1972) is adapted to the retrospective epidemiological study, in which cases and controls are sampled from a population in order to relate disease incidence rates to exposure to suspected risk factors. The method permits simultaneous study of several exposure variables, that may be discrete or continuous or a mixture of both, in the presence of additional explanatory variables and competing risks. The proportional hazards framework provides a fresh look at the concepts of matching, choice of controls, confounding and effect modification. The relative risk is a common and useful measure of association between a chronic disease and suspected risk factors. It is given by the ratio of disease incidence rates for variously exposed individuals. Use of this measure involves the often implicit assumption that the ratios of incident rates are relatively constant with respect to age and other personal char- acteristics. The proportional hazards model introduced by Cox (1972) gives a probabilistic formulation for the constant relative risk concept. It has been discussed both in relation to survival studies (Kalbfleisch & Prentice, 1973; Breslow, 1975) and in relation to prospective epidemiological studies (Breslow, 1977). The prospective epidemiological study involves follow- ing disease-free individuals having various exposure levels forward in time to observe disease incidence. Such studies, unfortunately, are frequently too long term and expensive to be feasible, especially if diseases of low incidence are under study. The retrospective case- control study attempts to circumvent these difficulties by selecting, from a well-defined population, cases with the study disease for comparison with a control sample of persons who are disease-free or have some other diagnosis. Exposure histories and other personal data are determined retrospectively by interview or other means. This paper proposes to adapt the proportional hazards model for use in case-control investigations. This approach enables one to associate multiple qualitative and quantitative exposure variables with one or more diseases in a general regression framework. Interaction terms in the regression equation lead to relaxation and testing of the constant relative risk assumption.

426 citations


Journal ArticleDOI
TL;DR: A model-dependent definition of a similarity inatrix is proposed and estimates based on this matrix are justified in a decision-theoretic framework.
Abstract: A parametric model for partitioning individuals into mutually exclusive groups is given. A Bayesian analysis is applied and a loss structure imposed. A model-dependent definition of a similarity inatrix is proposed and estimates based on this matrix are justified in a decision-theoretic framework. Some existing cluster analysis techniques are derived as special limiting cases. The results of the procedure applied to two data sets are compared with other analyses.

283 citations


Journal ArticleDOI
TL;DR: In this paper, an approximation to the conditional distribution of the maximum likelihood estimator of the change point given the ancillary values of observations adjacent to the estimated changepoint is derived and shown to be numerically equal to a Bayesian posterior distribution for the changepoint.
Abstract: SUMMARY Inference is considered for the point in a sequence of random variables at which the probability distribution changes. An approximation to the conditional distribution of the maximum likelihood estimator of the changepoint given the ancillary values of observations adjacent to the estimated changepoint is derived and shown to be numerically equal to a Bayesian posterior distribution for the changepoint. A hydrological example is given to show that inferences based on the conditional distribution of the maximum likelihood estimator can differ sharply from inferences based on the marginal distribution. the process governing their distribution changes abruptly, and consider the problem of inference about the unknown changepoint. Published research on this and related problems has provided changepoint estimators for a class of increasingly sophisticated models; for a recent example and references, see Ferriera (1975). The present paper turns back the clock to reconsider the simplest possible changepoint problem, one involving independent random variables whose distributions are completely specified apart from the unknown changepoint. A Bayesian solution to this problem is implicit in the work of Chernoff & Zacks (1964); a frequentist solution is given by Hinkley (1970). We consider here a third solution, based on a conditional frequentist approach. In a sense to be made precise, this third solution serves as a bridge linking the previous two. The conditional solution evolves from Hinkley's frequentist approach, which bases inferences on the asymptotic sampling distribution of the maximum likelihood estimator of the changepoint. The need for conditioning arises because the maximum likelihood estimator is not a sufficient statistic, and thus inferences based on its sampling distribution can be made more informative by conditioning on the values of appropriate ancillary statistics. It turns out that for the simple changepoint problem the resulting conditional inferences are nominally equivalent to certain Bayesian inferences in the sense that numerical differences can be made arbitrarily small. Nominal equivalence of the two solutions follows from an approximate version of a result obtained by Fisher (1934) in his conditional approach to estimating a translation parameter 9: if A is ancillary in that its distribution does not depend on 0, and if the density f(x I 0) of the data x can be factorized in the form

253 citations


Journal ArticleDOI
TL;DR: In this article, a nonparametric likelihood method was developed for the analysis of partially censored data arising from a multistate stochastic process, where the underlying process follows a semi-Markov model in which state changes form an embedded Markov chain and sojourn times are independent with distributions depending only on adjoining states.
Abstract: SUMMARY Nonparametric likelihood methods are developed for the analysis of partially censored data arising from a multistate stochastic process. It is assumed that the underlying process follows a semi-Markov model in which state changes form an embedded Markov chain and sojourn times are independent with distributions depending only on adjoining states. The general likelihood function for a set of partially censored observations is determined and maximized nonparametrically. The resulting nonparametric maximum likelihood estimators of the model unknowns are found to have several attractive properties. Approximate distributional results are derived. Some key word8: Censored observation; Clinical trial; Nonparametric maximum likelihood estimation; Semi-Markov model.

190 citations


Journal ArticleDOI
TL;DR: In this article, a simple Bayesian formula for the posterior probability of one of several regression models is shown to be systematically misleading unless all models have the same number of para- meters.
Abstract: SUMMARY A simple Bayesian formula for the posterior probability of one of several regression models is shown to be systematically misleading unless all models have the same number of para- meters. Even in this case the use of improper priors leads to arbitrary inferences, as it does more generally. An alternative weighting for choosing a model is suggested and the relation- ship between significance tests and data-dependent priors mentioned. An alternative to the analysis of variance and related significance tests for choosing regression models is the calculation of the posterior probabilities of the competing models. These two methods can lead to sharply opposed inferences, both asymptotically and for a finite number of observations. It is the purpose of the present paper to examine in detail the potentially misleading behaviour of the apparent posterior probabilities of the models, particularly as a function of the number of observations. In passing, some errors in the literature are identified and some anomalies explained. The calculation of posterior probabilities after each observation arises in a natural way in the analysis of designed experiments for discriminating between regression models. In such experiments sequential procedures are desirable. A formula for updating the posterior probabilities of the models after each observation is given by Box & Hill (1967). In ? 2 the nonsequential form for these posterior probabilities is derived and related to posterior probabilities calculated in the customary way by integration over the parameter space of each model. This relationship makes clear the prior assumptions underlying the sequential procedure. The difficulties which arise from the use of improper priors are also mentioned in ?2. The design of experiments is peripheral to the main discussion of the present paper. But since the numerical investigation of the behaviour of the posterior probabilities requires sequential experiments, design is discussed briefly in ?3. The behaviour of the posterior probabilities is investigated in ?4 for models with the same and with differing numbers of parameters when none, some or all of the models are true. Empirical weights for model plausibilities are derived in ? 5 and related to data-dependent prior probabilities for the models. The argument of the paper is almost entirely conducted in terms of two models with additive independent normal errors of constant known variance. The restriction to two models is purely for convenience, the extension to any number of models being straight- forward as is the extension to linearized nonlinear models. If the variance of the errors is unknown, the expressions for the posterior probabilities of the models become more com- plicated while the properties of the probabilities remain similar to those described here.

159 citations


Journal ArticleDOI
TL;DR: In this paper, a method for obtaining confidence intervals following sequential tests is described, where an order relation is defined among the points on the stopping boundary of the test and confidence limits are determined by finding those values of the unknown parameter for which the probabilities of more extreme deviations in the order relation than the one observed have prescribed values.
Abstract: SUMMARY A method is given for obtaining confidence intervals following sequential tests. It involves defining an order relation among points on the stopping boundary and computing the probability of a deviation more extreme in this order relation than the observed one. Particular attention is given to the case of a normal mean with known or unknown variance. A comparison with the customary fixed sample size interval based on the same data is given. The purpose of this paper is to describe a method for obtaining confidence intervals following sequential tests. An order relation is defined among the points on the stopping boundary of the test. The confidence limits are determined by finding those values of the unknown parameter for which the probabilities of more extreme deviations in the order relation than the one observed have prescribed values. To facilitate understanding the proposed procedures, most of the paper is restricted to estimating the mean of a normal population with known variance following the class of sequential tests recommended by Armitage (1975) for clinical trials. The case of unknown variance is discussed briefly in ? 4. It is easy to see that the proposed method is valid more generally, although the probability calculations required to implement it depend on the specific parent distribution and stopping rule. A closely related method was proposed by Armitage (1958), who studied the case of binomial data numerically by enumeration of sample paths. Let xl, x2, ... be independent and normally distributed with unknown mean ,u and known variance cr2. Let sn =x1 + .. . + xn, and for given b > 0 consider the stopping rule

Journal ArticleDOI
TL;DR: The method developed below, while being subjective to some extent, goes a long way towards resolving the difficulty in certain cases of determining the wiindow width appropriate to a given sample.
Abstract: where Xl,...,Xn are independent identically distributed real observations, a is a kernel function and h(n) is a sequence of window widths, assumed to tend to zero as n tends to infinity. The kernel estimator has been widely discussed; for a survey see Rosenblatt (1971). When applying the method in practice it is of course necessary to choose a kernel and a window width. The choice of kernel was considered by Epachenikov (1969) who showed that there is in some sense an optimal kernel, which is part of a parabola, but that any reasonable kernel gives almost optimal results. Therefore the choice of kernel is not as important a problem in practice as might be supposed. It is quite satisfactory to choose a kernel for computational convenience, as below, or for any other attractive reason, such as, for example, the argument leading to the quadratic spline kernel used by Boneva, Kendall & Stefanov (1971) in their 'spline transform' technique. While the choice of kernel does not seem to lead to much difficulty, at least for reasonably large sample sizes, the choice of window width is quite a different matter. The results of Silverman (1978) show that the kernel estimate is uniformly consistent under quite mild conditions on the rate of convergence of the window width to zero, but that the rate of consistency can be very slow. The very interesting practical work of Boneva et al. (1971) shows that the estimates can change dramatically under quite small variations in window width. Thus there seems to be considerable need for objective methods of determining the wiindow width appropriate to a given sample. The method developed below, while being subjective to some extent, goes a long way towards resolving this difficulty in certain cases. First the method is described and some applications to sets of data are considered. The application of the method to multivariate data is then discussed. Finally, the theoretical justification of the method is obtained.

Journal ArticleDOI
TL;DR: In this paper, a parametric model is developed for the analysis of square contingency tables with ordered categories, which is suitable for analyzing data on a nominal scale, but not suitable for analysis on an ordinal scale.
Abstract: SUMMARY A parametric model is developed for the analysis of square contingency tables with ordered categories. Order among the categories is a built-in feature of the new model and this means that it is unnecessary to assign arbitrary 'scores' to the row and column variables. Special cases of the proposed model include conditional symmetry and symmetry. The relationship with marginal homogeneity is also described. The model for quasisymmetry is considered and is shown to be invariant under general permutation transformations of the indices, which makes it suitable for analysing data on a nominal scale. On the other hand, the new model, called p symmetry, is invariant only under the special reverse permutation transformation. This restricted invariance property makes the latter model more suitable for analysing data on an ordinal scale.

Journal ArticleDOI
TL;DR: In this article, a review of previous work on testing multivariate normality is presented and the arguments for concentrating on tests of linearity of regression are indicated and such tests, both coordinate-dependent and invariant, are developed.
Abstract: SUMMARY Previous work on testing multivariate normality is reviewed. Coordinate-dependent and invariant procedures are distinguished. The arguments for concentrating on tests of linearity of regression are indicated and such tests, both coordinate-dependent and invariant, are developed.

Book ChapterDOI
TL;DR: In this article, an objective procedure of evaluation of the prior distribution in a Bayesian model is developed and the classical ignorance prior distribution is newly interpreted as the locally impartial prior distribution.
Abstract: In developing an estimate of the distribution of a future observation it becomes natural and necessary to consider a distribution over the space of parameters. This justifies the use of Bayes procedures in statistical inference. An objective procedure of evaluation of the prior distribution in a Bayesian model is developed and the classical ignorance prior distribution is newly interpreted as the locally impartial prior distribution.


Journal ArticleDOI
Nan M. Laird1
TL;DR: In this paper, an empirical Bayes method for smoothing two-way tables is presented, which is based on the use of the log linear model and a normal prior distribution, and two approximations are considered, one which utilizes the EM algorithm.
Abstract: SUMMARY An empirical Bayes method for smoothing two-way tables is presented. It is based on the use of the log linear model and a normal prior distribution. Estimation of the variance com- ponent in the prior is discussed, and two approximations are considered, one which utilizes the EM algorithm. This paper proposes estimates of contingency table cell probabilities based on combining the log linear model with normal prior distributions. Attention is mainly confined to the two-way table, with possible extensions discussed in the last section. The approach taken is empirical Bayes, but there are obvious parallels with variance component models for con- tingency tables, which are considered in the last section. Our approach draws from many sources. Good (1956) proposed a normal prior model quite similar to ours for the two-way table used in our example. Good's approach is to use a prior distribution on the probabilities of the multinomial to give smoothed cell probabilities. Expanding upon Good's method, Fienberg & Holland (1970, 1973) proposed empirical Bayes estimates for two-way tables with a Dirichlet prior for the cell probabilities. They compared various methods for estimating the parameters of the Dirichlet distribution. We contrast their estimates with ours in our example. Our multinomial-normal model is a minor variant of that introduced by Leonard (1975). Our approach differs chiefly in the handling of the parameters of the normal distribution. Because our models are so similar, we defer discussion of Leonard's work to the next section, where we introduce our sampling model and prior distribution, and discuss estimation of the cell probabilities assuming the parameters in the prior are known. Then we discuss estimation of the prior parameters, give an example, and finally consider advantages, disadvantages and extensions of our approach.

Journal ArticleDOI
TL;DR: In this paper, the authors examined maximum likelihood techniques as applied to classification and clustering problems, and showed that the classification maximum likelihood technique, in which individual observations are assigned on an "all-or-nothing" basis to one of several classes as part of the maximization process, gives results which are asymptotically biased.
Abstract: SUMMARY This paper examines maximum likelihood techniques as applied to classification and clustering problems, and shows that the classification maximum likelihood technique, in which individual observations are assigned on an "all-or-nothing" basis to one of several classes as part of the maximization process, gives results which are asymptotically biased. This extends Marriott's (1975) work for normal component distributions. Numerical examples are presented for normal component distributions and for a problem in genetics. The results indicate that biases can be severe, though determining in simple form when the biases will and will not be severe seems difficult.

Journal ArticleDOI
TL;DR: In this paper, the first-order noncircular autoregressive model is considered and Edgeworth and saddlepoint approximations are given for the distribution of the least squares estimator of the autoregression coefficient.
Abstract: SUMMARY The first-order noncircular autoregressive model is considered and Edgeworth and saddlepoint approximations are given for the distribution of the least squares estimator of the autoregressive coefficient. Numerical calculations are used to compare the two approximations with the exact distribution, found by numerical integration. Both approximations are unsatisfactory when the autoregressive coefficient is moderately large and the sample size small, the saddlepoint because it is undefined in the tail area, and the Edgeworth because it badly distorts tail area probabilities. When the sample size is larger, the saddlepoint approximation performs well and is capable of three decimal place accuracy, although it is still not available in tail areas when the autoregressive coefficient is large.


Journal ArticleDOI
TL;DR: In this article, the effect of Poisson mixtures, especially as represented by the negative binomial distribution, on statistical inference is examined and the main finding is that probabilities of rejection are increased, sometimes considerably.
Abstract: SUMMARY Data presented as contingency tables and classified by qualitative or quantitative methods are usually analysed on the basis of a Poisson log linear model. We examine the effect of Poisson mixtures, especially as represented by the negative binomial distribution, on statistical inference. The main finding is that probabilities of rejection are increased, sometimes considerably. In the case of heterogeneous binary data, attention is also given to the problems of analysis implied by Poisson mixtures.


Journal ArticleDOI
TL;DR: In this paper, the authors proposed one step iterations based on a second derivative approximation to the surface, which can be obtained quickly from initial estimates, and the analysis resulting from these estimates is asymptotically equivalent to the minimum dispersion analysis.
Abstract: Classical analysis of variance with least squares fitting is often used to discern structure in a linear model. McKean & Hettmansperger (1976) proposed a robust analysis based on ranks using the R-estimates proposed by Jaeckel (1972). These rank procedures depend on minimizing a dispersion surface and as a result are computationally restricted to small to moderate sized sets. In this paper we propose one step iterations based on a second derivative approximation to the surface. These estimates can be obtained quickly from initial estimates. Further the analysis resulting from these estimates is asymptotically equivalent to the minimum dispersion analysis. Thus it can be recommended for large data sets.

Journal ArticleDOI

Journal ArticleDOI
TL;DR: Several models for dependent competing risk data are considered for studying the effects of risk factors and techniques based on marginally sufficient statistics and partial likelihood are used for their estimation.
Abstract: SUMMARY Several models for dependent competing risk data are considered. It is assumed that the effects of certain risk factors or treatments are of primary interest and such effects are written as regression parameters in a proportional cause-specific hazards model. Techniques based on marginally sufficient statistics and partial likelihood are used for their estimation. Efficiency comparisons are made among estimates based on the suggested models. Special attention is given to the analysis of matched pair data and the methods are illustrated in a detailed analysis of data obtained from an ongoing Swedish twin study. It is well known that competing causes of death must be taken into account in the study of how risk factors affect a specific cause of death or disease. To cite an extreme example (Cornfield, 1957), if smoking not only increases the instantaneous chance of contracting lung cancer but also causes a large increase in the chance of dying from other causes, then the observed proportion of lung cancer deaths over a study period could be greater in nonsmokers than smokers because the early deaths of smokers from other causes would tend to preclude them from dying of lung cancer. In a study of surgically treated breast cancer patients Cutler, Axtell & Shottenfeld (1969) found that a reduction in survival in older patients relative to younger patients disappeared when the increased mortality at older ages due to other causes was taken into account. In studies of therapy for the treatment of leukaemia, deaths due to infection should be considered as a competing cause lest this potentially controllable cause of death obscures the potential benefit of therapy. Cornfield's paper presents a good elementary account of the competing risk problem and Cox (1959) provides a fundamental paper on models for competing risks. In this paper, models and analyses are considered for studying the effects of risk factors,

Journal ArticleDOI
TL;DR: In this paper, the authors present a method for testing the local regularity of individual to individual nearest neighbor distances, which is based on computer simulation to overcome the problems created by the presence of a boundary and by the lack of statistical independence of the nearest neighbour distances.
Abstract: SUMMARY Sorne nearest neighbour test procedures, assuming a null hypothesis of a Poisson process in an infinite plane, are shown to be inapplicable when a complete map of individuals is available. Two statistics, the squared coefficient of variation of squared nearest neighbour distances, and the ratio of the geometric mean to the arithmetic mean of the squared dis- tances, are particularly appropriate to testing for local regularity in this situation. Two methods of carrying out the test, the first based on coinputer simulation and the second an approximation not requiring simulation, are presented. Additionally, indices of local regularity are suggested. Existing tests of randomness, based on distance methods, of individuals in a plane are primarily intended for use with large populations. The null hypothesis is that their observed distribution is a realization of a two-dimensional Poisson point process, infinite in extent. For convenience we shall refer to this starting point, and the results which can be derived from it, as infinite plane theory. Tests have been based on distances to nearest neighbours of randomly chosen individuals (Skellam, 1952), distances to first and second nearest neigh- bours of randomly chosen points (Holgate, 1965), and a combination of individual to nearest individual and point to nearest individual distances (Hopkins, 1954). More recently Besag & Gleaves (1973) have proposed tests based on the T-square method of sampling. In some studies, for example of territorial behaviour, a complete map of individual locations in a relatively small population within a given boundary is available. This situation is somewhat different from that outlined above: the null hypothesis of interest then is that individual positions are a realization of a process locating them independently and at random within the given boundary. This paper presents a method for testing this null hypothesis which is designed to detect regularity of spacing of individuals on a small scale, irrespective of the global pattern. Pielou (1974, p. 155) discussed a number of ecological mechanisms which might cause such local regularity and also gave a method, based on infinite plane theory, to test for it. The method presented in this paper utilizes individual to individual nearest neighbour distances, and is based on computer simulation to overcome the problems created by the presence of a boundary and by the lack of statistical independence of the nearest neighbour distances. Two statistics are suggested as being particularly appropriate to the detection of local regularity. Their moments under infinite plane theory are derived and methods of approximation to their sampling distribution discussed. An approximate method is presented which would provide a satisfactory test of randomness in itself for some

Journal ArticleDOI
TL;DR: The second-order asymptotic properties of the jackknife procedure are discussed in this article, and the robust alternatives to the average pseudovalue are discussed, as well as the use of jackknife pseudovalues in obtaining estimates less sensitive to extreme data points.
Abstract: SUMMARY Second-order asymptotic properties of the jackknife procedure are discussed, and the jackknifed estimator is shown to be a vulnerable estimator whose variation can be severely underestimated by the jackknife standard error. Simple robust alternatives to the average pseudovalue are discussed. Particular emphasis is placed on estimation of a correlation coefficient. Numerical examples are given. The jackknife is a method for distribution-free bias reduction and standard error estimation. For a wide class of problems it is known that the jackknife produces consistent results. An excellent review of applications and asymptotic theory is given by Miller (1974). Recently there have been several investigations of small-sample properties of the jackknife procedure (Hin-kley, 1977a, b), which show that some adjustments are necessary in order to obtain accurate confidence intervals using the jackknife. In an unpublished paper, B. Efron ha? shown that the jackknife gives a rough linear approximation to another subsampling method for getting confidence intervals. In the present paper we examine two further aspects of the jackknife, namely the use ol second-order asymptotics in assessing finite-sample properties, and the use of jackknife pseudovalues in obtaining estimates less sensitive to extreme data points. The discussion is illustrated throughout with results for the correlation estimate. A brief summary of the results is as follows: jackknifed estimators can have very large haphazard bias compared to the original estimators; the jackknife estimate of standard error can severely underestimate the standard error of the jackknifed estimator; and use of the jackknife pseudovalues in residual and trimmed-mean analyses can give considerably improved estimators. Section 2 summarizes the standard jackknife method and illustrates it on an artificial data set, where certain difficulties are apparent. Second-order properties of the jackknife are derived in ? 3, and numerical results are given for the correlation example. The same example is used in ? 4, where robust analysis via pseudovalues is discussed. Section 5 gives brief conclusions. Throughout the paper we assume that the basic estimate is obtained from independent, identically distributed random variables. Moreover we assume that the estimate is a regular differentiable functional of the empirical distribution function, with at least two derivatives.

Journal ArticleDOI
TL;DR: In this article, a logistic distribution which is known to be very close to Normal distribution is shown to be much closer to student's t-distribution with 9 d. f.
Abstract: : Logistic distribution which is known to be very close to Normal distribution is shown to be much closer to student's t-distribution with 9 d. f. (Author)

Journal ArticleDOI
TL;DR: In this paper, the authors proposed to estimate the concentration parameter, K, in the d-dimensional von Mises-Fisher distribution on the basis of the marginal distribution of the resultant length only.
Abstract: SUMMARY Using sufficiency arguments it is proposed to estimate the concentration parameter, K, in the d-dimensional von Mises-Fisher distribution on the basis of the marginal distribution of the resultant length only. Distributional properties of the estimate are given. The analogue to the Neyman-Scott example of inconsistency is considered for the von Mises-Fisher distribution, and again an estimate based on the marginal distribution of the resultant lengths is proposed and discussed.

Journal ArticleDOI
TL;DR: In this paper, the Schucany-Frawley test is shown to be misleading and does not provide a satisfactory answer to Kendall's question, and a new test for agreement between two groups of judges is proposed.
Abstract: : The 'problem of m rankings' so named by Kendall and studied extensively by Kendall, Babington Smith, and others, considers the relationship between the rankings that a group of m judges assigns to a set of k objects. Suppose there are two groups of judges ranking the objects. Given that there is agreement within each group of judges, how can we test for evidence of agreement between the two groups. This question, recently posed to us by Kendall, has been studied by Schucany, Frawley and Li. In this paper we show that the test of agreement proposed by Schucany and Frawley, and further advanced by Li and Schucany, is misleading and does not provide a satisfactory answer to Kendall's question. After pinpointing various defects of the Schucany-Frawley test, we adapt a procedure, proposed by Wald and Wolfowitz in a slightly different context, to furnish a new test for agreement between two groups of judges.