scispace - formally typeset
Search or ask a question

Showing papers in "Statistical Methods in Medical Research in 1996"


Journal ArticleDOI
KF Rust1, Jnk Rao2
TL;DR: The use of the jackknife, balanced repeated replication, and the bootstrap techniques for estimating sampling variances and the use of such variance estimates in drawing inferences from survey data is discussed.
Abstract: The analysis of survey data requires the application of special methods to deal appropriately with the effects of the sample design on the properties of estimators and test statistics. The class of replication techniques represents one approach to handling this problem. This paper discusses the use of these techniques for estimating sampling variances, and the use of such variance estimates in drawing inferences from survey data. The techniques of the jackknife, balanced repeated replication (balanced half-samples), and the bootstrap are described, and the properties of these methods are summarized. Several examples from the literature of the use of replication in analysing large complex surveys are outlined.

687 citations


Journal ArticleDOI
TL;DR: Various weighting and imputation methods that assign values for missing responses are used to compensate for item nonresponses.
Abstract: Missing data occur in survey research because an element in the target population is not included on the survey's sampling frame (noncoverage), because a sampled element does not participate in the survey (total nonresponse) and because a responding sampled element fails to provide acceptable responses to one or more of the survey items (item nonresponse). A variety of methods have been developed to attempt to compensate for missing survey data in a general purpose way that enables the survey's data file to be analysed without regard for the missing data. Weighting adjustments are often used to compensate for noncoverage and total nonresponse. Imputation methods that assign values for missing responses are used to compensate for item nonresponses. This paper describes the various weighting and imputation methods that have been developed, and discusses their benefits and limitations.

558 citations


Journal ArticleDOI
TL;DR: Concerns with implementation should not deter the biostatistician from using MCMC methods, but rather help to ensure wise use of these powerful techniques.
Abstract: Appropriate models in biostatistics are often quite complicated. Such models are typically most easily fit using Bayesian methods, which can often be implemented using simulation techniques. Markov chain Monte Carlo (MCMC) methods are an important set of tools for such simulations. We give an overview and references of this rapidly emerging technology along with a relatively simple example. MCMC techniques can be viewed as extensions of iterative maximization techniques, but with random jumps rather than maximizations at each step. Special care is needed when implementing iterative maximization procedures rather than closed-form methods, and even more care is needed with iterative simulation procedures: it is substantially more difficult to monitor convergence to a distribution than to a point. The most reliable implementations of MCMC build upon results from simpler models fit using combinations of maximization algorithms and noniterative simulations, so that the user has a rough idea of the location and scale of the posterior distribution of the quantities of interest under the more complicated model. These concerns with implementation, however, should not deter the biostatistician from using MCMC methods, but rather help to ensure wise use of these powerful techniques.

208 citations


Journal ArticleDOI
TL;DR: Basic concepts of Box-Jenkins modelling are reviewed, which allow the stochastic dependence of consecutive data to be modelled and have become well established in such fields as economics and environmental medicine.
Abstract: Notifications of diseases, entries in a hospital, injuries due to accidents, etc., are frequently collected in fixed equally spaced intervals. Such observations are likely to be dependent. In environmental medicine, where series such as daily concentrations of pollutants are collected and analysed, it is evident that dependence of consecutive measurements may be important. A high concentration of a pollutant today has a certain 'inertia', i.e. a tendency to be high tomorrow as well. Dependence of consecutive observations may be equally important when data such as blood glucose are recorded within a single patient. ARIMA models (autoregressive integrated moving average models, Box-Jenkins models), which allow the stochastic dependence of consecutive data to be modelled, have become well established in such fields as economics. This article reviews basic concepts of Box-Jenkins modelling. The methods are illustrated by applications. In particular, the following topics are presented: the ARIMA model, transfer function models (assessment of relations between time series) and intervention analysis (assessment of changes of time series).

176 citations


Journal ArticleDOI
TL;DR: The use of the sampling weights when fitting models to complex survey data is considered and it is shown that when the sample is selected with unequal selection probabilities, ignoring the sample selection scheme in the inference process can yield misleading results.
Abstract: The use of the sampling weights when fitting models to complex survey data is considered. It is shown that when the sample is selected with unequal selection probabilities that are related to the values of the response variables even after conditioning on all the available design information, ignoring the sample selection scheme in the inference process, can yield misleading results. Probability weighting of the sample observations yields consistent estimators of the model parameters and protects against model misspecification, although in a limited sense. Other methods of incorporating the sampling weights in the inference process are discussed and compared to the use of probability weighting.

159 citations


Journal ArticleDOI
TL;DR: Different variants of latent class analysis (LCA) for dichotomous data are described in the following: the basic (unconstrained) model, models with parameters fixed to given values and with equality constraints on parameters, multigroup LCA including mixed-group vali dation, and linear logistic LCAincluding its relationship to the Rasch model and to the measurement of change in latent subgroups.
Abstract: In the introduction we give a brief characterization of the usual measures for indicating the quality of diagnostic procedures (sensitivity, specificity and predictive value) and we refer to their relationship to parameters of the latent class model. Different variants of latent class analysis (LCA) for dichotomous data are described in the following: the basic (unconstrained) model, models with parameters fixed to given values and with equality constraints on parameters, multigroup LCA including mixed-group validation, and linear logistic LCA including its relationship to the Rasch model and to the measurement of change in latent subgroups. The problem with the identifiability of latent class models and the possibilities for statistically testing their fit are outlined. The second part refers to latent class models for polytomous data. Special attention is paid to simple variants having fixed and/or equated parameters and to log-linear extension of LCA with its possibility for including on the latent level. Several examples are presented to illustrate typical applications of the model. The paper ends with some warnings that should be taken into consideration by potential users of LCA.

137 citations


Journal ArticleDOI
TL;DR: The possibility of modelling the sampling design using fixed and random effects to redefine target parameters, improve estimators of standard target parameters and improve standard variance estimators is investigated.
Abstract: Health surveys typically have stratified multistage clustered designs in which individuals are sampled with differing probabilities The sampling design is taken into account in a classical survey analysis by using sample-weighted estimators and variance estimators calculated at the primary-sampling-unit level In this paper we investigate the possibility of modelling the sampling design using fixed and random effects to redefine target parameters, improve estimators of standard target parameters and improve standard variance estimators References in which this type of additional modelling was used in health surveys are given The problem of estimating population variance components is discussed in some detail, with an application involving estimation of between- and within-family variance components in the Hispanic Health and Nutrition Examination Survey

100 citations


Journal ArticleDOI
TL;DR: This paper gives a general introduction to the topic of infinite mixture densities, which should help when considering the other more specialized papers in this issue.
Abstract: Finite mixture densities can be used to model data from populations known or suspected to contain a number of separate subpopulations. Most commonly used are mixture densities with Gaussian (univariate or multivariate) components, but mixtures with other types of component are also increasingly used to model, for example, survival times. This paper gives a general introduction to the topic which should help when considering the other more specialized papers in this issue.

91 citations


Journal ArticleDOI
TL;DR: An overview of analysis strategies for survey data is presented and examples for analyses are provided through data from two large US health surveys, the National Health Interview Survey and the Longitudinal Study of Aging, involving logistic regression, time-to- event analysis, and repeated measures analysis.
Abstract: Large-scale health surveys provide a wealth of information for addressing problems in health sciences research. Designed for multiple purposes, these surveys frequently have large sample sizes and extensive measurements of demographic and socioeconomic characteristics, risk factors, disease outcomes and health care service use and costs. Complex features of the sampling design typically employed to select the survey sample, coupled with the vast amount of information available from the survey database, underlie issues that must be addressed during data processing and analysis. Numerous articles in the literature have focused on the debate of whether or not, and how, to control for features of the sample design during data analysis. Traditional statistical methods for simple random samples and the software that accompanies them have historically not had the capacity to account for the survey design. Recent advancements in statistical methodology for survey data analysis have greatly expanded the analytical tools available to the survey analyst. Commercial software packages that incorporate these methods offer the analyst convenient ways for applying such tools to large survey databases in an easy and efficient manner. We present an overview of analysis strategies for survey data and illustrate their application via the SUDAAN software system. Examples for analyses are provided through data from two large US health surveys, the National Health Interview Survey and the Longitudinal Study of Aging. Questions of both a cross-sectional and longitudinal nature are addressed. The examples involve logistic regression, time-to-event analysis, and repeated measures analysis.

90 citations


Journal ArticleDOI
TL;DR: This review attempts to provide insight into the role that mixture distributions play in contemporary human genetics research by providing examples from the litera ture that describe applications of mixture models in human Genetics research.
Abstract: The use of mixture distributions in genetics research dates back to at least the late 1800s when Karl Pearson applied them in an analysis of crab morphometry. Pearson's use of normal mixture distributions to model the mixing of different species of crab (or 'families' of crab as he referred to them) within a defined geographic area motivated further use of mixture distributions in genetics research settings, and ultimately led to their development and recognition as intuitive modelling devices for the effects of underlying genes on quantitative phenotypic (i.e. trait) expression. In addition, mixture distributions are now used routinely to model or accommodate the genetic heterogeneity thought to underlie many human diseases. Specific applications of mixture distribution models in contemporary human genetics research are, in fact, too numerous to count. Despite this long, consistent and arguably illustrious history of use, little mention of mixture distributions in genetics research is made in many recent...

48 citations


Journal ArticleDOI
TL;DR: This paper tries to give an overview of a fascinating area of electroencephalogram analysis by treating more extensively problems with some statistical appeal, which leads inevitably to some overlap with the authors' own work.
Abstract: The quantitative analysis of the electroencephalogram (EEG) relies heavily on methods of time series analysis. A quantitative approach seems indispensable for research (be it clinical or basic neurophysical research), but it can also be a useful information for purely clinical purposes. Apart from the ongoing spontaneous EEG, evoked potentials (EPs) also play an important role. They can be elicited by simple sensory stimuli or more complex stimuli. Their analysis requires methods which are different from those for the spontaneous EEG. Those methods operate usually in the time domain and offer many challenging problems to statisticians. Methods for analysing the spontaneous EEG usually work in the frequency domain in terms of spectra and coherences. Biomedical engineers who take care of the equipment are usually also trained in time series analysis. Thus, they have contributed much more to methodological progress for analysing EEGs and EPs, compared with statisticians. However, the availability of a sample of subjects, and the associated problems in modelling followed by an inferential analysis could make a larger influence from the statistical side quite profitable. This paper tries to give an overview of a fascinating area. In doing so we treat more extensively problems with some statistical appeal. This leads inevitably to some overlap with our own work.

Journal ArticleDOI
TL;DR: This article describes structural time series models and gives examples of how they can be applied in medicine, and considers univariate models first, and then extended to include explanatory variables and interventions.
Abstract: Structural time series models are formulated in terms of components, such as trends, seasonals and cycles, which have a direct interpretation. This article describes such models and gives examples of how they can be applied in medicine. Univariate models are considered first, and then extended to include explanatory variables and interventions. Multivariate models are then shown to provide a framework for modelling longitudinal data and for carrying out intervention analysis with control groups. The final sections deal with data irregularities and non-Gaussian observations.

Journal ArticleDOI
TL;DR: This article discusses some robust techniques that have been suggested in the literature and aims to make apparent the relevance of some of these techniques to biostatistical work.
Abstract: All statistical analyses demand uncertain inputs or assumptions. This is especially true of Bayesian analyses. In addition to the usual concerns about the agreement of the data and model, a Bayesian must contemplate the effect of an uncertain prior specification. The degree to which inferences are robust to changes in the prior is of primary interest. This article discusses some robust techniques that have been suggested in the literature. One goal is to make apparent the relevance of some of these techniques to biostatistical work.

Journal ArticleDOI
TL;DR: Methods developed for analysis of red blood cell volume distributions have now been adopted by the International Council for Standardization in Haematology and analysis of population trans ferrin saturation distributions from the white population of the USA has led to an independent estimate of the prevalence of homozygotes for haemochromatosis.
Abstract: Two specific applications of finite mixture distributions in haematology include (1) the analysis of the distribution of red blood cell volumes to characterize and quantify alterations in erythrocy...

Journal ArticleDOI
TL;DR: The problems of aliasing and estimation of the spectrum using windowing and autoregressive techniques are covered and the methods shown to apply in the analysis of signals from heart rate, blood pressure, EEG, other electrical signals and hormone levels.
Abstract: This paper reviews the current use of spectral analysis in clinical medicine. We cover the problems of aliasing and estimation of the spectrum using windowing and autoregressive techniques. These t...

Journal ArticleDOI
TL;DR: Hastie and Tibshirani provide a very clear and interesting description of generalized additive models, but their article was flawed by their claim, without any evidence, that such models are useful.
Abstract: Hastie and Tibshirani provide a very clear and interesting description of generalized additive models (Vol. 4, No. 3), and so it is a pity that their article was flawed by their claim, without any evidence, that such models are useful. Hastie and Tibshirani’s article contains a description of the theory of generalized additive models and two examples. They give two theoretical benefits for the models. The first of these is that

Journal ArticleDOI
TL;DR: In this paper, the authors discuss several analytical techniques, including comparisons of proportions, comparisons of means, regression/correlation methods, and exploratory data analysis, sample size calculation and nonparametric techniques.
Abstract: Chapters 3, 4 and 5 cover the topics of probability, estimation and hypothesis testing, respectively. Chapter 6 then builds on this foundation by illustrating a number of analytical techniques, including comparisons of proportions, comparisons of means and regression/correlation methods. Finally, Chapter 7 offers the miscellaneous topics of exploratory data analysis, sample size calculation and nonparametric techniques. Although this book does communicate many of its points well, there are a few noticeable shortcomings. As a minor example, the authors state that relative risks are not measurable in case-control studies, but they do not explain why. A more worrying example is that they discourage the use of the pooled estimate of variance in the context of a confidence interval on the difference between

Journal ArticleDOI
TL;DR: The results indicate that the HME showed good prediction performance, and also gave the additional benefit of providing for the opportunity to assess the degree of certainty of the model in its predictions.
Abstract: This paper studies the problems of inference and prediction in a class of models known as hierarchical mixtures-of-experts (HME). The statistical model underlying an HME is a mixture model in which both the mixture coefficients and the mixture components are generalized linear models. Bayesian inference regarding an HME's parameters is presented in the contexts of regression and classification using Markov chain Monte Carlo methods. A benefit of this Bayesian approach is the ability to obtain a sample from the posterior distribution of any functional of the parameters of the given model. In this way, more information is obtained than provided by a point estimate. The methods are illustrated on a nonlinear regression problem and on a breast cancer classification problem. The results indicate that the HME showed good prediction performance, and also gave the additional benefit of providing for the opportunity to assess the degree of certainty of the model in its predictions.

Journal ArticleDOI
TL;DR: The use of large-scale surveys for analytical purposes is increasing as data from such surveys are made more publicly available and as the research community has become more aware of the potential of such surveys as mentioned in this paper.
Abstract: The use of large-scale surveys for analytical purposes is increasing as data from such surveys are made more publicly available and as the research community has become more aware of the potential of such surveys. However, the application of appropriate analysis methods has lagged behind the use of survey data, primarily because they are not part of the main statistical analysis packages and since the literature on survey design and analysis tends to be separate from main stream statistical literature. The main difficulty in analysing data generated from complex surveys

Journal ArticleDOI
TL;DR: The first volume of Experimental Design and Analysis of Experimental Data, published in 1952, has 13 chapters devoted to: "The Process of Science", "Principles of Experimental Designs", "Randomization", "Completely Randomized Designs" as discussed by the authors, "Comparison of Treatments", "Use of Supplementary Information" and "Split Plot Designs".
Abstract: text Design and analysis of experiments, published in 1952; the authors wanted to rearrange and update the material in that book, as the subject has developed considerably. My first impression, based on a review of Volume I (there is a second volume which will be published in 1996), is that the authors have succeeded quite well in this task. This volume has 13 chapters devoted to: ’The Process of Science’, ’Principles of Experimental Designs’, ’Survey of Experimental Designs’, ’Linear Model Theory’, ’Randomization’, ’The Completely Randomized Designs’, ’Comparison of Treatments’, ’Use of Supplementary Information’, ’Randomized Block Designs’, ’Latin Square Type Designs’, and ’Split-Plot Designs’. The authors felt it ’absolutely necessary’ to add this first chapter on the process of science dealing with the role of experiments, the role of data analysis and the ideas of probability in a population of repetitions. This is a very scholarly and philosophical contribution, rich in content, but how many students are going to read it and how many instructors are going to teach this subject matter, is a debatable question. The second chapter, on the principles of experimental designs, is very well written, summarizing and providing an overview of many aspects of the subject. The schematic description of four experimental situations is novel and illuminating. The third chapter gives an account of and distinguishes between error-control designs, treatment designs and sampling designs. Chapter 4 is an expository account of the whole Linear Model Theory, which is the backbone of the analysis of experimental data. It is a complete account, with almost all details such as linear mod-