scispace - formally typeset
Search or ask a question

Showing papers in "Australian & New Zealand Journal of Statistics in 2000"


Journal ArticleDOI
Abstract: Summary This paper describes a technique for computing approximate maximum pseudolikelihood estimates of the parameters of a spatial point process. The method is an extension of Berman & Turner’s (1992) device for maximizing the likelihoods of inhomogeneous spatial Poisson processes. For a very wide class of spatial point process models the likelihood is intractable, while the pseudolikelihood is known explicitly, except for the computation of an integral over the sampling region. Approximation of this integral by a finite sum in a special way yields an approximate pseudolikelihood which is formally equivalent to the (weighted) likelihood of a loglinear model with Poisson responses. This can be maximized using standard statistical software for generalized linear or additive models, provided the conditional intensity of the process takes an ‘exponential family’ form. Using this approach a wide variety of spatial point process models of Gibbs type can be fitted rapidly, incorporating spatial trends, interaction between points, dependence on spatial covariates, and mark information.

358 citations


Journal ArticleDOI
TL;DR: In this article, a roughness penalty is used for spline fitting with a spatially adaptive spline estimator, where a large and fixed number of knots is used and smoothing is achieved by putting a quadratic penalty on the jumps of the pth derivative at the knots.
Abstract: We study spline fitting with a roughness penalty that adapts to spatial heterogeneity in the regression function. Our estimates are pth degree piecewise polynomials with pi1 continuous derivatives. A large and fixed number of knots is used and smoothing is achieved by putting a quadratic penalty on the jumps of the pth derivative at the knots. To be spatially adaptive, the logarithm of the penalty is itself a linear spline but with relatively few knots and with values at the knots chosen to minimize GCV. This locally-adaptive spline estimator is compared with other spline estimators in the literature such as cubic smoothing splines and knot-selection techniques for least-squares regression. Our estimator can be interpreted as an empirical Bayes estimate for a prior allowing spatial heterogeneity. In cases of spatially heterogeneous regression functions,

269 citations


Journal ArticleDOI
TL;DR: The theoretical results can be used to develop diagnostics for deciding if a time series can be modelled by some linear autoregressive model, and for selecting among several candidate models.
Abstract: We give a general formulation of a non-Gaussian conditional linear AR(1) model subsuming most of the non-Gaussian AR(1) models that have appeared in the literature. We derive some general results giving properties for the stationary process mean, variance and correlation structure, and conditions for stationarity. These results highlight similarities and dierences with the Gaussian AR(1) model, and unify many separate results appearing in the literature. Examples illustrate the wide range of prop- erties that can appear under the conditional linear autoregressive assumption. These results are used in analysing three real data sets, illustrating general methods of estima- tion, model diagnostics and model selection. In particular, we show that the theoretical results can be used to develop diagnostics for deciding if a time series can be modelled by some linear autoregressive model, and for selecting among several candidate models.

142 citations


Journal ArticleDOI
TL;DR: In this paper, a weighted least squares analysis of variance of the absolute values of both mean-based and median-based residuals is proposed to test homogeneity of variance for both balanced and unbalanced designs.
Abstract: In 1960 Levene suggested a potentially robust test of homogeneity of variance based on an ordinary least squares analysis of variance of the absolute values of mean-based residuals Levene's test has since been shown to have inflated levels of significance when based on the F-distribution, and tests a hypothesis other than homogeneity of variance when treatments are unequally replicated, but the incorrect formulation is now standard output in several statistical packages This paper develops a weighted least squares analysis of variance of the absolute values of both mean-based and median-based residuals It shows how to adjust the residuals so that tests using the F-statistic focus on homogeneity of variance for both balanced and unbalanced designs It shows how to modify the F-statistics currently produced by statistical packages so that the distribution of the resultant test statistic is closer to an F-distribution than is currently the case The weighted least squares approach also produces component mean squares that are unbiased irrespective of which variable is used in Levene's test To complete this aspect of the investigation the paper derives exact second-order moments of the component sums of squares used in the calculation of the mean-based test statistic It shows that, for large samples, both ordinary and weighted least squares test statistics are equivalent; however they are over-dispersed compared to an F variable

68 citations


Journal ArticleDOI
DC Blest1
TL;DR: In this article, a new rank correlation coefficient was proposed based on insights from the calculation of Kendall's coefficient, which leads to a rank correlation measure that stands outside general theory but has greater power of discrimination amongst differing reorderings of the data whilst simultaneously being strongly correlated with both Spearman's and Kendall's τ.
Abstract: Within the bounds of a general theory of rank correlation two particular measures have been adopted widely: Spearman's rank correlation coefficient, ρ, in which ranks replace variates in Pearson's product-moment correlation calculation; and Kendall's τ, in which the disarray of x-ordered data due to a y-ordering is measured by counting the minimum number, s, of transpositions (interchanges between adjacent ranks) of the y-ordering sufficient to recover the x-ordering. Based on insights from the calculation of Kendall's coefficient, this paper develops a graphical approach which leads to a new rank correlation coefficient akin to that of Spearman. This measure appears to stand outside general theory but has greater power of discrimination amongst differing reorderings of the data whilst simultaneously being strongly correlated with both ρ and τ. The development is focused on situations where agreement over ordering is more important for top place getters than for those lower down the order as, for example, in subjectively judged Olympic events such as ice skating. The basic properties of the proposed coefficient are identified. © Australian Statistical Publishing Association Inc. 2000. Published by Blackwell Publishers Ltd.

67 citations


Journal ArticleDOI
TL;DR: The asymptotic properties of maximum likelihood estimators based on a single large tree are examined to examine whether they can be modified to accommodate larger correlations between cousin cells and other cells in the same generation.
Abstract: Huggins & Basawa (1999) proposed several extensions of the bifurcating autoregressive model used to model cell lineage trees. These models overcame limitations in the original bifurcating autoregressive mode by allowing larger correlations between cousin cells and other cells in the same generation. Huggins & Basawa only considered maximum likelihood inference based on independent trees. This paper examines the asymptotic properties of maximum likelihood estimators based on a single large tree.

43 citations


Journal ArticleDOI
TL;DR: In this article, a generalized additive model with first-order Markov structure and mixed transition density has been used to model the relationship between the Southern Oscillation Index and Melbourne's rainfall.
Abstract: The paper considers the modelling of time series using a generalized additive model with first-order Markov structure and mixed transition density having a discrete component at zero and a continuous component with positive sample space. Such models have application, for example, in modelling daily occurrence and intensity of rainfall, and in modelling numbers and sizes of insurance claims. The paper shows how these methods extend the usual sinusoidal seasonal assumption in standard chain-dependent models by assuming a general smooth pattern of occurrence and intensity over time. These models can be fitted using standard statistical software. The methods of Grunwald & Jones (2000) can be used to combine these separate occurrence and intensity models into a single model for amount. The models are used to investigate the relationship between the Southern Oscillation Index and Melbourne's rainfall, illustrated with 36 years of rainfall data from Melbourne, Australia.

39 citations


Journal ArticleDOI
TL;DR: In this paper, a generalized estimating equations approach was used to investigate the impact of technology on vessel performance in a trawl fishery during 1988-96, while accounting for spatial and temporal correlations in the catch-effort data.
Abstract: The article describes a generalized estimating equations approach that was used to investigate the impact of technology on vessel performance in a trawl fishery during 1988-96, while accounting for spatial and temporal correlations in the catch-effort data. Robust estimation of parameters in the presence of several levels of clustering depended more on the choice of cluster definition than on the choice of correlation structure within the cluster. Models with smaller cluster sizes produced stable results, while models with larger cluster sizes, that may have had complex within-cluster correlation structures and that had within-cluster covariates, produced estimates sensitive to the correlation structure. The preferred model arising from this dataset assumed that catches from a vessel were correlated in the same years and the same areas, but independent in different years and areas. The model that assumed catches from a vessel were correlated in all years and areas, equivalent to a random effects term for vessel, produced spurious results. This was an unexpected finding that highlighted the need to adopt a systematic strategy for modelling. The article proposes a modelling strategy of selecting the best cluster definition first, and the working correlation structure (within clusters) second. The article discusses the selection and interpretation of the model in the light of background knowledge of the data and utility of the model, and the potential for this modelling approach to apply in similar statistical situations.

37 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider families of statistics for testing the goodness-of-fit of various parametric models such as the normal, exponential or Poisson, and show that each family consists of weighted integrals over the squared modulus of some measure of deviation from the parametric model, expressed by means of an empirical transform of the data.
Abstract: This paper considers families of statistics for testing the goodness-of-fit of various parametric models such as the normal, exponential or Poisson. Each family consists of weighted integrals over the squared modulus of some measure of deviation from the parametric model, expressed by means of an empirical transform of the data. Letting the rate of decay of the weight function tend to infinity, each test statistic, after a suitable rescaling, approaches a limit that is closely connected to the first non-zero component of Neyman's smooth test for the parametric model.

34 citations


Journal ArticleDOI
TL;DR: In 1995, the Government Statistician's Office in Queensland conducted a household survey to study population migration using computer assisted telephone interviewing and random digit dialling as discussed by the authors, which involved a sample of 110 000 telephone numbers resulting in 38 000 responding households.
Abstract: Computer-assisted telephone interviewing and random digit dialling are increasingly being used to conduct household surveys in Australia. However, there is little published information concerning Australian experience with such surveys. In 1995 the Government Statistician's Office in Queensland conducted a household survey to study population migration using these techniques. The survey involved a sample of 110 000 telephone numbers resulting in 38 000 responding households. This article describes a computerized survey management system that was developed and which provided information concerning important operational and quality aspects of the survey.

30 citations


Journal ArticleDOI
TL;DR: In this paper, a non-homogeneous Poisson process with a periodic intensity function was used to model the annual cycle of hurricane arrival times and estimated wind speed and central pressure return periods and non-encounter probabilities.
Abstract: This paper studies the annual arrival cycle and return period properties of landfalling Atlantic Basin hurricanes. A non-homogeneous Poisson process with a periodic intensity function is used to model the annual cycle of hurricane arrival times. Wind speed and central pressure return periods and non-encounter probabilities are estimated by combining the Poisson arrival model with extreme value peaks-over-threshold methods. The data used in this study contain all Atlantic Basin hurricanes that have made landfall in the contiguous United States during the years 1935–98 inclusive.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a model which allows for the possible presence of carryover effects up to k subsequent periods, together with all the interactions between treatments applied at k+ 1 successive periods.
Abstract: In cross-over experiments, where different treatments are applied successively to the same experimental unit over a number of time periods, it is often expected that a treatment has a carry-over effect in one or more periods following its period of application. The effect of interaction between the treatments in the successive periods may also affect the response. However, it seems that all systematic studies of the optimality properties of cross-over designs have been done under models where carry-over effects are assumed to persist for only one subsequent period. This paper proposes a model which allows for the possible presence of carry-over effects up to k subsequent periods, together with all the interactions between treatments applied at k+ 1 successive periods. This model allows the practitioner to choose k for any experiment according to the requirements of that particular experiment. Under this model, the cross-over designs are studied and the class of optimal designs is obtained. A method of constructing these optimal designs is also given.

Journal ArticleDOI
TL;DR: In this paper, the authors derived estimating equations for modelling circular data with longitudinal structure for a family of circular distributions with two parameters, and showed that the estimators that follow from these equations are consistent and asymptotically normal.
Abstract: This paper derives estimating equations for modelling circular data with longitudinal structure for a family of circular distributions with two parameters Estimating equations for modelling the circular mean and the resultant length are given separately Estimating equations are then derived for a mixed model This paper shows that the estimators that follow from these equations are consistent and asymptotically normal The results are illustrated by an example about the direction taken by homing pigeons

Journal ArticleDOI
TL;DR: In this paper, the authors considered residuals for time series regression and showed that orthogonal and marginal residuals allow identification of outliers, model mis-specification and mean shifts.
Abstract: This paper considers residuals for time series regression. Despite much literature on visual diagnostics for uncorrected data, there is little on the autocorrelated case. In order to examine various aspects of the fitted time series regression model, three residuals are considered. The fitted regression model can be checked using orthogonal residuals; the time series error model can be analysed using marginal residuals; and the white noise error component can be tested using conditional residuals. When used together, these residuals allow identification of outliers, model mis-specification and mean shifts. Due to the sensitivity of conditional residuals to model mis-specification, it is suggested that the orthogonal and marginal residuals be examined first.

Journal ArticleDOI
Naomi Altman1
TL;DR: This paper reviews methodology for non‐parametric regression with autocorrelated errors which is a natural compromise between the two methods and demonstrates the rather surprising result that for these data, ordinary kriging outperforms more computationally intensive models including both universal kriged and correlated splines for spatial prediction.
Abstract: Both kriging and non-parametric regression smoothing can model a non-stationary regression function with spatially correlated errors. However comparisons have mainly been based on ordinary kriging and smoothing with uncorrelated errors. Ordinary kriging attributes smoothness of the response to spatial autocorrelation whereas non-parametric regression attributes trends to a smooth regression function. For spatial processes it is reasonable to suppose that the response is due to both trend and autocorrelation. This paper reviews methodology for non-parametric regression with autocorrelated errors which is a natural compromise between the two methods. Re-analysis of the one-dimensional stationary spatial data of Laslett (1994) and a clearly non-stationary time series demonstrates the rather surprising result that for these data, ordinary kriging outperforms more computationally intensive models including both universal kriging and correlated splines for spatial prediction. For estimating the regression function, non-parametric regression provides adaptive estimation, but the autocorrelation must be accounted for in selecting the smoothing parameter.

Journal ArticleDOI
TL;DR: In this article, the conventional one-sided CUSUM procedure is extended for controlling autocorrelated data which are approximately AR1, and the performances of these procedures are compared using simulation studies and the average run length as a criterion.
Abstract: The conventional one-sided CUSUM procedure is extended for controlling autocorrelated data which are approximately AR1. The performances of these procedures are compared using simulation studies and the average run length as a criterion. Also these procedures are compared to two versions of Shewhart individual charts. Two applications are considered.


Journal ArticleDOI
Abstract: Summary This paper develops an empirical Bayesian analysis for the von Mises distribution, whichis the most useful distribution for statistical inference of angular data. A two-stage infor-mative prior is proposed, in which the hyperparameter is obtained from the data in oneof the stages. This empirical or approximate Bayes inference is justified on the basis ofmaximum entropy, and it eliminates the modified Bessel functions. An example with realdata and a realistic prior distribution for the regression coefficients is considered via aMetropolis-within-Gibbs algorithm. Keywords: angular data; link function; maximum entropy; Metropolis-within-Gibbs algorithm;regression models. 1. Introduction Bagchi & Guttman (1988) considered a Bayesian analysis of the multivariate von Misesdistribution and developed theorems that are analogous to the theorems of Lindley & Smith(1972). They formulated a conjugated prior for the mean direction and a non-informativeprior for the concentration parameter which maximize the entropy, subject to constraints onthe first moments and assuming that the density integrates to one (see Bagchi, 1987 p

Journal ArticleDOI
TL;DR: In this paper, two alternative assumptions about the underlying income distribution are considered, namely a lognormal distribution and the Singh-Maddala (1976) income distribution, and alternative posterior distributions of the Gini coefficient are calculated.
Abstract: When available data comprise a number of sampled households in each of a number of income classes, the likelihood function is obtained from a multinomial distribution with the income class population proportions as the unknown parameters. Two methods for going from this likelihood function to a posterior distribution on the Gini coefficient are investigated. In the first method, two alternative assumptions about the underlying income distribution are considered, namely a lognormal distribution and the Singh-Maddala (1976) income distribution. In these cases the likelihood function is reparameterized and the Gini coefficient is a nonlinear function of the income distribution parameters. The Metropolis algorithm is used to find the corresponding posterior distributions of the Gini coefficient from a sample of Bangkok households. The second method does not require an assumption about the nature of the income distribution, but uses (a) triangular prior distributions, and (b) beta prior distributions, on the location of mean income within each income class. By sampling from these distributions, and the Dirichlet posterior distribution of the income class proportions, alternative posterior distributions of the Gini coefficient are calculated. © Australian Statistical Publishing Association Inc. 2000. Published by Blackwell Publishers Ltd.

Journal ArticleDOI
TL;DR: In this article, the authors proposed modifying tight upper limits by an initial replacement of the unknown nuisance parameter vector by its profile maximum likelihood estimator, which is shown to be close to optimal.
Abstract: Summary When the data are discrete, standard approximate confidence limits often have coverage well below nominal for some parameter values. While ad hoc adjustments may largely solve this problem for particular cases, Kabaila & Lloyd (1997) gave a more systematic method of adjustment which leads to tight upper limits, which have coverage which is never below nominal and are as small as possible within a particular class. However, their computation for all but the simplest models is infeasible. This paper suggests modifying tight upper limits by an initial replacement of the unknown nuisance parameter vector by its profile maximum likelihood estimator. While the resulting limits no longer possess the optimal properties of tight limits exactly, the paper presents both numerical and theoretical evidence that the resulting coverage function is close to optimal. Moreover these prole upper limits are much (possibly many orders of magnitude) easier to compute than tight upper limits.

Journal ArticleDOI
TL;DR: In this paper, the conditional properties of the mean and total estimators of a finite population when auxiliary information is available are reviewed and a sample statistic capable of indicating the presence of substantial conditional biases is proposed.
Abstract: This paper reviews conditional properties of the mean and total estimators of a finite population when auxiliary information is available. An exact design-based conditional analysis for complex sampling designs is intractable, but an asymptotic conditional framework can be developed. Within such a framework the paper establishes sufficient conditions for conditional unbiasedness and explores conditional properties of various types of regression estimators. A sample statistic capable of indicating the presence of substantial conditional biases is proposed, and illustrated by a simulation study.

Journal ArticleDOI
TL;DR: In this article, a characterization of the joint probability of a random vector (X, Y) such that the two variables X and Y on Rd belong to the multidimensional Meixner class and fulfil a bi-orthogonality condition involving orthogonal polynomials is presented.
Abstract: The well-known Meixner class (Meixner, 1934) of probabilities on R has been extended recently to Rd (Pommeret, 1996). This generalized Meixner class corresponds to the simple quadratic natural exponential families characterized by Casalis (1996). Following Lancaster (1975), the present paper offers a characterization of the joint probability of a randomvector (X, Y) such that the two variables X and Y on Rd belong to the multidimensional Meixner class and fulfil a bi-orthogonality condition involving orthogonal polynomials. The joint probabilities, called Lancaster probabilities, are characterized by two sequences of orthogonal polynomials with respect to the margins and a sequence of expectations of products. Some multivariate probabilities are studied, namely the Poisson-Gaussian and the gamma-Gaussian.

Journal ArticleDOI
TL;DR: In this article, a family of models for analysing diagnostic test data is described, where the underlying ROC curves are specified by a shift parameter, a shape parameter and a link function.
Abstract: Summary The performance of a diagnostic test is summarized by its receiver operating characteristic (ROC) curve. Empirical data on a test’s performance often come in the form of observed true positive and false positive relative frequencies, under varying conditions. This paper describes a family of models for analysing such data. The underlying ROC curves are specified by a shift parameter, a shape parameter and a link function. Both the position along the ROC curve and the shift parameter are modelled linearly. The shape parameter enters the model non-linearly but in a very simple manner. One simple application is to the meta-analysis of independent studies of the same diagnostic test, illustrated on some data of Moses, Shapiro & Littenberg (1993). A second application to so-called vigilance data is given, where ROC curves differ across subjects, and modelling of the position along the ROC curve is of primary interest.

Journal ArticleDOI
TL;DR: In this paper, a partial likelihood method is proposed for estimating vaccine efficacy for a general epidemic model, which only requires information on the sequence in which individuals are infected and not the exact infection times.
Abstract: A partial likelihood method is proposed for estimating vaccine efficacy for a general epidemic model. In contrast to the maximum likelihood estimator (MLE) which requires complete observation of the epidemic, the suggested method only requires information on the sequence in which individuals are infected and not the exact infection times. A simulation study shows that the method performs almost as well as the MLE. The method is applied to data on the infectious disease mumps.

Journal ArticleDOI
TL;DR: In this paper, an iterative algorithm is proposed for finding maximum likelihood estimates for two-component parallel systems, where the data consist of Type-II censored data X(i), i = 1, n, from one component, and their concomitants Y [i] randomly censored at X(r), the stopping time of the experiment.
Abstract: When two-component parallel systems are tested, the data consist of Type-II censored data X(i), i= 1, n, from one component, and their concomitants Y [i] randomly censored at X(r), the stopping time of the experiment. Marshall & Olkin's (1967) bivariate exponential distribution is used to illustrate statistical inference procedures developed for this data type. Although this data type is motivated practically, the likelihood is complicated, and maximum likelihood estimation is difficult, especially in the case where the parameter space is a non-open set. An iterative algorithm is proposed for finding maximum likelihood estimates. This article derives several properties of the maximum likelihood estimator (MLE) including existence, uniqueness, strong consistency and asymptotic distribution. It also develops an alternative estimation method with closed-form expressions based on marginal distributions, and derives its asymptotic properties. Compared with variances of the MLEs in the finite and large sample situations, the alternative estimator performs very well, especially when the correlation between X and Y is small.

Journal ArticleDOI
TL;DR: In this article, the probabilities of failure or success under mixed acceptance sampling schemes and under a variety of conditions were derived for the probability of failure and success under a wide range of conditions.
Abstract: Mixed acceptance sampling schemes are commonly used for consumer protection. In a typical application, a sample is taken from a product lot and tested to check that the average value of the sample is not less than the labelled net content, and that there is no ‘unreasonable’ deficiency in any individual item. Exact and approximate expressions are obtained for the probabilities of failure or success under such schemes and under a variety of conditions.

Journal ArticleDOI
TL;DR: In this article, the robustness of the β-trimmed mean estimator has been investigated in terms of relative efficiency and weak continuity of the estimator in neighbourhoods of the exponential distribution.
Abstract: Qualitative robustness of the β-trimmed mean has already been observed in terms of relative efficiency and weak continuity of that estimator in neighbourhoods of the exponential distribution. Two more robustness considerations are given here in favour of the β-trimmed mean: the statistical functional representing this estimator is Frechet differentiable; and it is a special case of the trimmed likelihood estimator. Further, simulations suggest that a fixed proportion of trimming is preferable to adaptive estimation in this case.

Journal ArticleDOI
TL;DR: In this paper, the authors present a theoretical monograph of generalized linear models, statistical curvature and second order inference for generalised linear models and their applications in statistical data analysis.
Abstract: As the author says in the introduction, this is a theoretical book. Although the book is enhanced by some simulation studies and data examples, they are cursory. The data examples tend to illustrate the theoretical computations rather than being serious analyses of the data at hand: so the book is not, in its current form, a guide for practitioners. Nor is it designed to be a textbook. However, as a research monograph it would be of considerable interest and can be recommended to researchers in generalized linear models, statistical curvature and second order inference.© Australian Statistical Publishing Association Inc. 2000

Journal ArticleDOI
Lesley Hunt1
TL;DR: This paper is an attempt to develop a preliminary understanding of the personal backgrounds that have encouraged present practising statisticians to move into this field, and to see how they experience their work.
Abstract: There has been no work done on why statisticians have chosen their particular profession. With the increasing emphasis on the development of biotechnology, it seems important to encourage people to take up statistics and to offer the perspective that the study of statistics brings. This paper is an attempt to develop a preliminary understanding, by open-ended in-depth interviewing, of the personal backgrounds that have encouraged present practising statisticians to move into this field, and to see how they experience their work.

Journal ArticleDOI
TL;DR: In this article, the relative efficiency of the upper trimmed mean to mean estimators under a sufficient condition is investigated. But the analysis is restricted to the neighborhood of an exponential scale parametric family.
Abstract: This paper investigates two estimators under the non-parametric neighbourhoods of an exponential scale parametric family. It uses the relative efficiency approach and shows that the tighter lower bounds on the relative efficiency of the upper trimmed mean to mean can be obtained under a sufficient condition. This condition gives the relationship between the possible positive lower bound and the degree of asymmetry of some related distributions. Similar arguments can be applied to the comparison of dispersion estimators under the neighbourhoods of a normal distribution.