scispace - formally typeset
Search or ask a question

Showing papers in "Australian & New Zealand Journal of Statistics in 1999"


Journal ArticleDOI
TL;DR: In this article, a three-parameter generalized exponential distribution (GED) was used for analysis of lifetime data, which is a particular case of the exponentiated Weibull distribution originally proposed by Mudholkar et al.
Abstract: Summary The three-parameter gamma and three-parameter Weibull distributions are commonly used for analysing any lifetime data or skewed data. Both distributions have several desirable properties, and nice physical interpretations. Because of the scale and shape parameters, both have quite a bit of flexibility for analysing different types of lifetime data. They have increasing as well as decreasing hazard rate depending on the shape parameter. Unfortunately both distributions also have certain drawbacks. This paper considers a three-parameter distribution which is a particular case of the exponentiated Weibull distribution originally proposed by Mudholkar, Srivastava & Freimer (1995) when the location parameter is not present. The study examines different properties of this model and observes that this family has some interesting features which are quite similar to those of the gamma family and the Weibull family, and certain distinct properties also. It appears this model can be used as an alternative to the gamma model or the Weibull model in many situations. One dataset is provided where the three-parameter generalized exponential distribution fits better than the three-parameter Weibull distribution or the three-parameter gamma distribution.

1,084 citations


Journal ArticleDOI
TL;DR: The approach taken to design MULTIMIX and how some of the statistical problems were dealt with are described, which is used to cluster a large medical dataset.
Abstract: Hunt (1996) implemented the finite mixture model approach to clustering in a program called MULTIMIX. The program is designed to cluster multivariate data that have categorical and continuous variables and that possibly contain missing values. This paper describes the approach taken to design MULTIMIX and how some of the statistical problems were dealt with. As an example, the program is used to cluster a large medical dataset.

78 citations


Journal ArticleDOI
TL;DR: In this article, a development of the "starship" method for two forms of generalized λ distributions (gλd) is presented, which can be used for the full parameter space and is flexible, allowing choice of both the form of the generalized ǫ distribution and of the nature of fit required.
Abstract: A development of the ‘starship’ method (Owen, 1988), a computer intensive estimation method, is presented for two forms of generalized λ distributions (gλd). The method can be used for the full parameter space and is flexible, allowing choice of both the form of the generalized λ distribution and of the nature of fit required. Some examples of its use in fitting data and approximating distributions are given. Some simulation studies explore the sampling distribution of the parameter estimates produced by this method for selected values of the parameters and consider comparisons with two other methods, for one of the gλd distributional forms, not previously so investigated. In the forms and parameter regions available to the other methods, it is demonstrated that the starship compares favourably. Although the differences between the methods, where available, tend to disappear with largersamples, the parameter coverage, flexibility and adaptability of the starship method make it attractive. However, the paper also demonstrates that care is needed when fitting and using such quantile-defined distributional families that are rich in shape, but have complex properties.

70 citations


Journal ArticleDOI
TL;DR: In this paper, an example is given which concerns mining impacts on stream benthic invertebrate communities, and includes calculation of Bayesian posterior probabilities, testing of interval hypotheses via equivalence test procedures offers a way of testing meaningful hypotheses and of giving effect to the precautionary principle.
Abstract: Tests of point hypotheses are common in observational environmental studies, but there is concern about their appropriateness. Exhortations to restrict sample sizes on the basis of a power analysis for such tests may fail to satisfy the environmental professional, because sample sizes are not always easily controlled. Testing of interval hypotheses via equiva-lence test procedures offers a way of testing meaningful hypotheses and of giving effect to the precautionary principle. An example is given which concerns mining impacts on stream benthic invertebrate communities, and includes calculation of Bayesian posterior probabilities.

56 citations


Journal ArticleDOI
TL;DR: It is concluded that statistical thinking and design of analysis, as exemplified by achievements in clinical epidemiology, may fit well with the emerging activities of data mining and 'knowledge discovery in databases' (DM&KDD).
Abstract: Data mining seeks to extract useful, but previously unknown, information from typically massive collections of non-experimental, sometimes non-traditional data. From the perspective of statisticians, this paper surveys techniques used and contributions from fields such as data warehousing, machine learning from artificial intelligence, and visualization as well as statistics. It concludes that statistical thinking and design of analysis, as exemplified by achievements in clinical epidemiology, may fit well with the emerging activities of data mining and 'knowledge discovery in databases' (DM&KDD).

56 citations


Journal ArticleDOI
TL;DR: In this article, the authors show that Dimensional Analysis can make a contribution to model formation when some of the measurements in the problem are of physical factors, and they conclude that it can be used in the preliminary stages of regression analysis when developing formulations involving continuous variables with several dimensions.
Abstract: SUMMARY Dimensional Analysis can make a contribution to model formation when some of the measurements in the problem are of physical factors. The analysis constructs a set of independent dimensionless factors that should be used as the variables of the regression in place of the original measurements. There are fewer of these than the originals and they often have a more appropriate interpretation. The technique is described briey and its proposed role in regression discussed and illustrated with examples. We conclude that Dimensional Analysis can be eectiv e in the preliminary stages of regression analysis when developing formulations involving continuous variables with several dimensions.

41 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed new techniques to calibrate estimators of variance of simple mean, ratio and regression estimators under different sampling schemes, and an empirical study has been carried out to address the properties of these proposed strategies.
Abstract: This investigation suggests new techniques to calibrate estimators of variance. Estimators of the variance of simple mean, ratio and regression estimators under different sampling schemes are shown to be special cases of the proposed calibration techniques. The approach has more practical use due to recent advances in programming techniques and computational speed. An empirical study has been carried out to address the properties of these proposed strategies.

40 citations


Journal ArticleDOI
Alan Lee1
TL;DR: This article used a new bivariate negative binomial distribution to model scores in the 1996 Australian Rugby League competition and simulated the 1996 season using the latter model to determine whether or not Manly did indeed deserve to win the competition.
Abstract: This paper uses a new bivariate negative binomial distribution to model scores in the 1996 Australian Rugby League competition. First, scores are modelled using the home ground advantage but ignoring the actual teams playing. Then a bivariate negative binomial regression model is introduced that takes into account the offensive and defensive capacities of each team. Finally, the 1996 season is simulated using the latter model to determine whether or not Manly did indeed deserve to win the competition.

36 citations


Journal ArticleDOI
TL;DR: In this article, the authors used angling from small recreational fishing boats to quantify the relative density of snapper (Pagrus auratus) in six areas within the Cape Rodney-Okakari Point Marine Reserve (New Zealand) and four areas adjacent to the reserve.
Abstract: Summary Angling from small recreational fishing boats was used as a sampling method to quantify the relative density of snapper (Pagrus auratus) in six areas within the Cape Rodney‐Okakari Point Marine Reserve (New Zealand) and four areas adjacent to the reserve. Penalized quasi-likelihood was used to fit a log-linear mixed-effects model having area and date as fixed effects and boat as a random effect. Simulation and first-order bias correction formulae were employed to assess the validity of the estimates of the area effects. The bias correction is known to be unsuitable for general use because it typically over-estimates bias, and this was observed here. However, it was qualitatively useful for indicating the direction of bias and for indicating when estimators were approximately unbiased. The parameter of primary interest was the ratio of snapper density in the marine reserve versus snapper density outside the reserve, and the estimator of this parameter was first-order asymptotically unbiased. This ratio of snapper densities was estimated to be 11 .3/:

35 citations


Journal ArticleDOI
TL;DR: In this paper, an EM algorithm is proposed for computing estimates of parameters of the negative bi-nomial distribution; the algorithm does not involve further iterations in the M-step, in contrast with the one given in Schader & Schmid (1985).
Abstract: An EM algorithm is proposed for computing estimates of parameters of the negative bi-nomial distribution; the algorithm does not involve further iterations in the M-step, in contrast with the one given in Schader & Schmid (1985). The approach can be applied to the corresponding problem in the logarithmic series distribution. The convergence of the proposed scheme is investigated by simulation, the observed Fisher information is derivedand numerical examples based on real data are presented.

29 citations


Journal ArticleDOI
TL;DR: The authors discusses several possible methods for adjusting the randomization procedure to allow for this type of problem, including generalizations of methods that have been proposed for comparing the means of several samples when there is unequal variance but no factor structure.
Abstract: If there are significant factor and interaction effects with analysis of variance using ran-domization inference, they can be detected by tests that compare the F-statistics for the real data with the distributions of these statistics obtained by randomly allocating either the original observations or the residuals to the various factor combinations. Such tests involve the assumption that the effect of factors or interactions is to shift the observations for a factor combination by a fixed amount, without changing the amount of variation at that combination. In reality the expected amount of variation at each factor combination, as measured by the variance, may not be constant, which may upset the properties of the tests for the effects of factors and interactions. This paper discusses several possible methods for adjusting the randomization procedure to allow for this type of problem, including generalizations of methods that have been proposed for comparing the means of several samples when there is unequal variance but no factor structure. A simulation study shows that the best of the methods examined is one for which the randomized sets of data are designed to approximate the distributions of F-statistics when unequal variance is present.

Journal ArticleDOI
TL;DR: In this article, the asymptotic power of the Jonckheere-type tests is computed by using results of Hajek (1968) which may be considered as extensions of the theorem of Chernoff & Savage (1958).
Abstract: For the c-sample location problem with ordered alternatives, the test proposed by Barlow et al. (1972 p. 184) is an appropriate one under the model of normality. For non-normal data, however, there are rank tests which have higher power than the test of Barlow et al., e.g. the Jonckheere test or so-called Jonckheere-type tests recently introduced and studied by Buning & Kossler (1996). In this paper the asymptotic power of the Jonckheere-type tests is computed by using results of Hajek (1968) which may be considered as extensions of the theorem of Chernoff & Savage (1958). Power studies via Monte Carlo simulation show that the asymptotic power values provide a good approximation to the finite ones even for moderate sample sizes.


Journal ArticleDOI
TL;DR: In this article, the authors discuss issues related to the improvement of maximum likelihood estimates in von Mises regression models and present general matrix expressions for the secondorder biases of the estimates of the mean parameters and concentration parameters.
Abstract: Summary This paper discusses issues related to the improvement of maximum likelihood estimates in von Mises regression models. It obtains general matrix expressions for the secondorder biases of maximum likelihood estimates of the mean parameters and concentration parameters. The formulae are simple to compute, and give the biases by means of weighted linear regressions. Simulation results are presented assessing the performance of corrected maximum likelihood estimates in these models.

Journal ArticleDOI
TL;DR: In this article, the authors considered the properties of the maximum likelihood estimator of the correlation coefficient, principally regarding precision, in various types of bivariate model which are popular in the applied literature.
Abstract: From a theoretical perspective, the paper considers the properties of the maximum likelihood estimator of the correlation coefficient, principally regarding precision, in various types of bivariate model which are popular in the applied literature. The models are: 'Full-Full', in which both variables are fully observed; 'Censored-Censored', in which both of the variables are censored at zero; and finally, 'Binary-Binary', in which both variables are observed only in sign. For analytical convenience, the underlying bivariate distribution which is assumed in each of these cases is the bivariate logistic. A central issue is the extent to which censoring reduces the level of Fisher's information pertaining to the correlation coefficient, and therefore reduces the precision with which this important parameter can be estimated.

Journal ArticleDOI
TL;DR: In this article, a nonparametric testing procedure for the parallelism of two first-order autoregressive processes is presented, where the Mann-Whitney statistic, its natural competitor two-sample t-test, and the bootstrap method are compared.
Abstract: A nonparametric testing procedure for the parallelism of two first-order autoregressive processes is presented. This paper discuss the Mann–Whitney statistic, its natural competitor two-sample t-test, and the bootstrap method. It studies the asymptotic efficacies of the studentized Mann–Whitney statistic and the t-test statistic with their relative efficiency. Simulation results for comparing the powers of these test statistics are also presented.

Journal ArticleDOI
TL;DR: In this paper, it was found that two-phase adaptive sampling has a lower MSE than adaptive cluster sampling for most populations and there appears to be little gain in using a stratified design with adaptive clustering.
Abstract: Stratified sampling is a technique commonly used for ecological surveys. In this study there appears to be little gain in using a stratified design with adaptive cluster sampling. Two-phase adaptive sampling is preferable to adaptive cluster sampling. Even though two-phase adaptive sampling can give biased estimates, it is found that two-phase adaptive sampling has a lower MSE than adaptive cluster sampling for most populations.

Journal ArticleDOI
TL;DR: Experimental designs which use extensive blocking and which are particularly useful in plant and tree breeding trials are discussed in this article, where they can be constructed either to accommodate field restrictions or take advantage of favorable plot layouts.
Abstract: Experimental designs which use extensive blocking and which are particularly useful in plant and tree breeding trials are discussed. They can be constructed either to accommodate field restrictions or take advantage of favourable plot layouts. Computer software is available to generate these design types for use in practice. Examples cover latinized row-column designs, t -latinized and partially-latinized designs and designs with unequal block sizes.

Journal ArticleDOI
Paul Kabaila1
TL;DR: In this paper, the authors presented an efficient simulation-based algorithm for estimating the probability density function of a random variable conditional on an observation of a continuous random vector X and the random variable Y. This algorithm is applicable to time series problems in which X D.X1;:::;Xn 1/ and Y D Xn where fXtg is a discrete-time stochastic process for which.
Abstract: Summary Suppose that the random vector X and the random variable Y are jointly continuous. Also suppose that an observation x of X can be easily simulated and that the probability density function of Y conditional on X D x is known. The paper presents an efficient simulation-based algorithm for estimating Efg.X;Y/ j h.X;Y/ D rg where g and h are real-valued functions. This algorithm is applicable to time series problems in which X D .X1;:::;Xn 1/ and Y D Xn where fXtg is a discrete-time stochastic process for which.X1;:::;Xn/ is a continuous random vector. A numerical example from time series analysis illustrates the algorithm, for prediction for an ARCH(1) process.

Journal ArticleDOI
TL;DR: In this paper, the use of local linear kernel regression (LRLK) was used to test whether the mean function of a sequence of long-range dependent processes has discontinuities or change-points.
Abstract: This paper considers the use of a local linear kernel regression method to test whether the mean function of a sequence of long-range dependent processes has discontinuities or change-points. It proposes a non-parametric estimation procedure and then establishes an asymptotic theory for the estimation procedure. Examples, simulated and real, illustrate the estimation procedure.

Journal ArticleDOI
TL;DR: In this paper, the authors present a general class of estimators for the finite population total when the emphasis is laid on the use of two auxiliary variables in a two-stage sampling.
Abstract: This paper presents a general class of estimators for the finite population total when the emphasis is laid on the use of two auxiliary variables in a two-stage sampling.

Journal ArticleDOI
TL;DR: In this paper, the derivatives of the perturbation-formed surface of the Pearson goodness-of-fit statistic were evaluated to identify outliers in contingency tables, and the resulting diagnostics were shown to be less susceptible to masking and swamping problems than residual-based measures.
Abstract: In order to identify outliers in contingency tables, we evaluate the derivatives of the perturbation-formed surface of the Pearson goodness-of-fit statistic. The resulting diagnostics are shown to be less susceptible to masking and swamping problems than residual-based measures. A Monte Carlo study further confirms the effectiveness of the proposed diagnostics.

Journal ArticleDOI
TL;DR: A personal view of the challenges ahead for statisticians, though a literature search indicates that a number of the points to be raised have been considered or alluded to by others in recent years.

Journal ArticleDOI
TL;DR: In this paper, the authors investigated the optimum form of experimental design for time-dependent data and found that the time dependence in the recovery data for an experiment at Bougainville Copper Limited (BCL) (Napier-Munn, 1995) can be described by an autoregressive-one process.
Abstract: Time dependence is an important characteristic of mineral processing plant data. This paper finds that the time dependence in the recovery data for an experiment at Bougainville Copper Limited (BCL) (Napier-Munn, 1995) can be described by an autoregressive-one process. The paper investigates the optimum form of experimental design for such data. Two intuitive approaches for the design of experiments involving time-dependent data have been disproved recently. Cheng & Steinberg (1991) showed that in some circumstances systematic experiments are preferable to replicated randomized block designs, and Saunders & Eccleston (1992) showed that rather than sampling at a frequency which ensures independent data, in some circumstances sampling intervals should be as small as possible. A third issue, raised in this paper, concerns the use of standard statistical tests when the data are serially correlated. It is shown that the simple paired t-test, suitably modified for time dependence, is appropriate and easily adapted to allow for a covariate and a sequential analysis. The results are illustrated using the BCL data and are already being used to design major experiments involving another mineral process.

Journal ArticleDOI
TL;DR: In this article, the authors examined the use of derived block designs derived from α-designs and compared them with designs generated directly using an interchange algorithm and concluded that the derived designs should be used when v is large, but that for small v they can be relatively inefficient.
Abstract: Resolvable block designs for v varieties in blocks of size k require v to be a multiple of k so that all blocks are of the same size. If a factorization of v is not possible then a resolvable design with blocks of unequal size is necessary. Patterson & Williams (1976) suggested the use of designs derived from α-designs and conjectured that such designs are likely to be very efficient in the class of resolvable designs with block sizes k and k– 1. This paper examines these derived designs and compares them with designs generated directly using an interchange algorithm. It concludes that the derived designs should be used when v is large, but that for small v they can be relatively inefficient.

Journal ArticleDOI
TL;DR: This article developed asymptotic expressions for the bias and efficiency both of the regression coefficient estimates and of the information sandwich estimate, and used them to study the behaviour of the estimates and also examined the effect of different cluster sizes and different degrees of correlation between the covariates.
Abstract: The Generalized Estimating Equation (GEE) method popularized by Liang and Zeger provides a very general method for fitting regression models to observations that occur in clusters. Features of the method are the specification of a ‘working correlation’ (a guess at the true correlation structure of the data) which is used to improve efficiency in estimating the regression coefficients, and the ‘information sandwich’ which provides a way of consistently estimating the standard errors of the estimated regression coefficients even if (as we might expect) the working correlation is wrong. This paper develops asymptotic expressions for the bias and efficiency both of the regression coefficient estimates and of the sandwich estimate, and uses them to study the behaviour of the estimates. It looks at the effect of the choice of the working correlation on the estimate and also examines the effect of different cluster sizes and different degrees of correlation between the covariates. The performance of these methods is found to be excellent, particularly when the degree of correlation in the responses and covariates is small to moderate.

Journal ArticleDOI
TL;DR: In this paper, the authors derived the joint generating function of a collection of pattern statistics associated with binary sequences, including independent and some dependent Bernoulli trials, including Markov dependent ones.
Abstract: The paper derives the joint generating function of a collection of pattern statistics associated with binary sequences. The models discussed cover independent and some dependent Bernoulli trials, including Markov dependent ones. The results cover, in particular, the moment generating function of the random search time for certain general binary patterns in the popular Knuth-Morris-Pratt algorithm and hence shed more light into its performance.

Journal ArticleDOI
TL;DR: In this paper, the authors make a detailed analysis of a family of discrete failure time distributions which meet both requirements and present a goodness-of-fit test for this model for the selection of an appropriate model for datasets of frequencies of the duration of atmospheric circulation patterns.
Abstract: Summary In recent years, a large number of new discrete distributions have appeared in the literature. However, flexible discrete models which, at the same time, allow for easy statistical inference, are still an exception. This paper makes a detailed analysis of a family of discrete failure time distributions which meets both requirements. It examines the maximum likelihood estimation of the unknown parameters and presents a goodness-of-fit test for this model. The test is used for the selection of an appropriate model for datasets of frequencies of the duration of atmospheric circulation patterns.


Journal ArticleDOI
TL;DR: In this paper, the authors developed simple approximations for the minimum capture effort required to achieve (i) no more than a certain probability of breakdown, (ii) a given relative standard error.
Abstract: One of the main aims of a recapture experiment is to estimate the unknown size, N of a closed population. Under the so-called behavioural model, individual capture probabilities change after the first capture. Unfortunately, the maximum likelihood estimator given by Zippin (1956) may give an infinite result and often has poor precision. Chaiyapong & Lloyd (1997) have given formulae for the asymptotic bias and variance as well as for the probability that the estimate is infinite. The purpose of this article is to tabulate the inversions of the above cited formulae so that practitioners can plan the required capture effort. This paper develops simple approximations for the minimum capture effort required to achieve (i) no more than a certain probability of breakdown, (ii) a given relative standard error.