scispace - formally typeset
Search or ask a question

Showing papers in "Statistics in Medicine in 1987"


Journal ArticleDOI
TL;DR: The age-period-cohort model is described and its ambiguities surrounding regular trends 'intensify' are shown and methods for presenting the results of analyses based upon this model which minimize the serious risk of misleading implications are recommended.
Abstract: Our first paper reviewed methods for modelling variation in cancer incidence and mortality rates in terms of either period effects or cohort effects in the general multiplicative risk model. There we drew attention to the difficulty of attributing regular trends to either period or cohort influences. In this paper we turn to the more realistic problem in which neither period nor cohort effects alone lead to an adequate description of the data. We describe the age-period-cohort model and show how its ambiguities surrounding regular trends 'intensify'. We recommend methods for presenting the results of analyses based upon this model which minimize the serious risk of misleading implications and critically review previous suggestions. The discussion is illustrated by an analysis of breast cancer mortality in Japan with special reference to the phenomenon of 'Clemmesen's hook'.

865 citations


Journal ArticleDOI
TL;DR: The modern approach to the analysis of data which justifies traditional methods of age standardization in terms of the multiplicative risk model is reviewed and the serious difficulties which attend the interpretation of regular trends are demonstrated.
Abstract: A main concern of descriptive epidemiologists is the presentation and interpretation of temporal variations in cancer rates. In its simplest form, this problem is that of the analysis of a set of rates arranged in a two-way table by age group and calendar period. We review the modern approach to the analysis of such data which justifies traditional methods of age standardization in terms of the multiplicative risk model. We discuss the use of this model when the temporal variations are due to purely secular (period) influences and when they are attributable to generational (cohort) influences. Finally we demonstrate the serious difficulties which attend the interpretation of regular trends. The methods described are illustrated by examples for incidence rates of bladder cancer in Birmingham, U.K., mortality from bladder cancer in Italy, and mortality from lung cancer in Belgium.

803 citations


Journal ArticleDOI
TL;DR: Much can be learned from combining or pooling data but it must be done cautiously, and substantial scientific input is required as to what criteria must be met by each potential study.
Abstract: Methods for combining data from several studies exist and appear to be quite useful. None satisfactorily addresses the question of what studies should be combined. This issue is the most serious methodological limitation. Even studies with statistically significant interaction might still be combined if the effect were in the same direction. Thus, substantial scientific input is required as to what criteria must be met by each potential study. Much can be learned from combining or pooling data but it must be done cautiously. Pooling exercises do not replace well designed prospective clinical trials. Efforts for establishing basic design criteria to allow for multicentre and multicountry trials to be more easily combined might be useful.

772 citations


Journal ArticleDOI
TL;DR: This paper provides exact power contours to guide the planning of reliability studies, where the parameter of interest is the coefficient of intraclass correlation rho derived from a one-way analysis of variance model.
Abstract: This paper provides exact power contours to guide the planning of reliability studies, where the parameter of interest is the coefficient of intraclass correlation rho derived from a one-way analysis of variance model. The contours display the required numbers of subjects k and number of repeated measurements n that provide 80 per cent power for testing Ho: rho less than or equal to rho 0 versus H1: rho greater than rho 0 at the 5 per cent level of significance for selected values of rho o. We discuss the design considerations of these results.

642 citations


Journal ArticleDOI
Colin B. Begg1
TL;DR: In this paper the various potential problems are described with reference to examples from the diagnostic literature and have implications for the design of diagnostic test evaluations, and the choice of suitable measures of test efficacy.
Abstract: Diagnostic tests are traditionally characterized by simple measures of efficacy such as the sensitivity and the specificity. These measures, though widely recognized and easy to understand, are subject to definitional arbitrariness. Moreover, studies constructed to estimate the sensitivity and specificity are susceptible to a variety of biases. In this paper the various potential problems are described with reference to examples from the diagnostic literature. These difficulties have implications for the design of diagnostic test evaluations, and the choice of suitable measures of test efficacy.

493 citations



Journal ArticleDOI
TL;DR: Meta-analysis is an important method of bridging the gap between undersized randomized control trials and the treatment of patients, but the opportunities for bias to distort the results are widespread and attempts must be made to introduce the controls found in prospective studies.
Abstract: Meta-analysis is an important method of bridging the gap between undersized randomized control trials and the treatment of patients. However, as in any retrospective study, the opportunities for bias to distort the results are widespread. Attempts must be made to introduce the controls found in prospective studies by blinding the selection of papers and extraction of data and making blinded duplicate determinations. Informal and personalized methods of obtaining data are probably more liable to error and bias than employing only published data. Publication bias is a serious problem requiring further research. There also need to be more comparisons of meta-analysed small studies with large co-operative trials.

233 citations


Journal ArticleDOI
TL;DR: A model is developed for pooling the results of clinical trials which is free from publication bias and is illustrated using the International Cancer Research Data Bank (ICRDB) registry of cancer clinical trials to evaluate the effect of chemotherapy on survival in advanced ovarian cancer.
Abstract: In evaluating therapies, clinical investigators often need to rely on the published clinical trial literature which may be biased in favour of studies with positive or 'encouraging' results and this may lead to erroneous conclusions of therapeutic effectiveness. The problem of publication bias can be magnified when the evaluation is based on a pooled analysis of clinical trial results, since in this case even small differences between treatment groups may reach statistical significance. In this paper a model is developed for pooling the results of clinical trials which is free from publication bias. It is proposed that an international registry of all clinical trials be established with the objectives and endpoints of each trial clearly defined in the register. In this way for each therapeutic issue researchers can select a cohort of clinical trials independently from the trial results. The approach is illustrated using the International Cancer Research Data Bank (ICRDB) registry of cancer clinical trials to evaluate the effect of chemotherapy on survival in advanced ovarian cancer. In this example, the conclusions based on a pooled analysis of registered trials have important differences from a more traditional review of the published trials. Implications of the results and problems in implementing the model are discussed.

209 citations


Journal ArticleDOI
TL;DR: A key feature of this model, compared with others, is that the components correspond with known features of the endocrinological regulation of growth and can be considered in isolation from one another.
Abstract: A new approach to modelling human linear growth from birth to maturity is presented. The model splits growth into three additive and partly superimposed components, appropriately named infancy, childhood and puberty; we refer to it as the ICP-model for obvious reasons. A key feature of this model, compared with others, is that the components correspond with known features of the endocrinological regulation of growth and can be considered in isolation from one another.

194 citations


Journal ArticleDOI
TL;DR: This paper demonstrates under general conditions the robustness of the t-test in that the maximum actual level of significance is close to the declared level.
Abstract: One may encounter the application of the two independent samples t-test to ordinal scaled data (for example, data that assume only the values 0, 1, 2, 3) from small samples. This situation clearly violates the underlying normality assumption for the t-test and one cannot appeal to large sample theory for validity. In this paper we report the results of an investigation of the t-test's robustness when applied to data of this form for samples of sizes 5 to 20. Our approach consists of complete enumeration of the sampling distributions and comparison of actual levels of significance with the significance level expected if the data followed a normal distribution. We demonstrate under general conditions the robustness of the t-test in that the maximum actual level of significance is close to the declared level.

178 citations


Journal ArticleDOI
TL;DR: Future trials should take into account the results of any relevant overviews in their design, and should plan to obtain sufficient numbers of events to contribute substantially to such overviews, which implies the need for randomized trials that are much larger than is currently standard.
Abstract: In order to avoid selective biases and to minimize random errors, inference about the effects of treatment on serious endpoints needs to be based not on one, or a few, of the available trial results, but on a systematic overview of the totality of the evidence from all the relevant unconfounded randomized trials. But, only where coverage of all, or nearly all, randomized patients in all relevant trials (or a reasonably unbiased sample of such trials) can be assured, is a systematic overview of trials reasonably trustworthy, for then any selective biases are likely to be small in comparison with any moderate effects of treatment. Checks for the existence of such biases can best be conducted if reasonably detailed data are available from each trial. Future trials should take into account the results of any relevant overviews in their design, and should plan to obtain sufficient numbers of events to contribute substantially to such overviews. In many cases, this implies the need for randomized trials that are much larger than is currently standard.

Journal ArticleDOI
TL;DR: Methods of measuring the extent of correlation (or clustering) are described and methods of adjusting the Mantel-Haenszel chi-square test statistic and the variance of the Mantal- Haenszel estimate of a common odds ratio in sets of 2 X 2 contingency tables are developed.
Abstract: Dependence between observations on a dichotomous variable renders invalid the usual chi-square tests of independence and inflates the variances of parameter estimates. Such a situation occurs, for example, when subjects consist of members of the same family or with repeated observations on the same person. In this paper we describe methods of measuring the extent of correlation (or clustering). We also develop methods of adjusting the Mantel–Haenszel chi-square test statistic and the variance of the Mantel–Haenszel estimate of a common odds ratio in sets of 2 × 2 contingency tables.

Journal ArticleDOI
TL;DR: It is concluded that the statistical knowledge of most doctors is so limited that they cannot be expected to draw the right conclusions from those statistical analyses which are found in papers in medical journals.
Abstract: A multiple choice test with nine statistical questions was sent to a random sample of Danish doctors to assess their knowledge of elementary statistical expressions (SD, SE, p less than 0.05, p greater than 0.05 and r). One hundred and forty eight (59 per cent) of 250 doctors answered the questions. The test was also completed by 97 participants in postgraduate courses in research methods, mainly junior hospital doctors. The median number of correct answers was 2.4 in the random sample and 4.0 in the other sample of doctors. It is concluded that the statistical knowledge of most doctors is so limited that they cannot be expected to draw the right conclusions from those statistical analyses which are found in papers in medical journals. Sixty-five per cent of the doctors in the random sample stated that it is very important that this problem is raised.

Journal ArticleDOI
TL;DR: A weighted paired t-test based on the empirical logistic transform is proposed for designs that randomize large aggregate clusters in each of several strata for assessing the statistical significance of the intervention effect over all strata.
Abstract: This paper discusses statistical techniques for the analysis of dichotomous data arising from a design in which the investigator randomly assigns each of two clusters of possibly varying size to interventions within strata. The problem addressed is that of assessing the statistical significance of the intervention effect over all strata. We propose a weighted paired t-test based on the empirical logistic transform for designs that randomize large aggregate clusters in each of several strata.

Journal ArticleDOI
TL;DR: The replicate variability of meta-analyses of controlled clinical trials has been assessed as a measure of scientific precision and further work in this area should include multivariate analyses in order to explore possible interactions in the factors accounting for the variability found in replicate meta-Analyses.
Abstract: The replicate variability of meta-analyses of controlled clinical trials has been assessed as a measure of scientific precision. 46 of 91 known meta-analysis papers were divided into 20 cohorts of studies of the same therapies. Ten cohorts contained meta-analyses with different statistical conclusions; 14 contained differing clinical conclusions with a wider spread than the statistically differing studies. Possible causes of variability, such as different trials included, different policies regarding the inclusion of non-randomized and unpublished trials, and different statistical methodologies, were not obvious causes of differing conclusions. Further work in this area should include multivariate analyses in order to explore possible interactions in the factors accounting for the variability found in replicate meta-analyses.

Journal ArticleDOI
TL;DR: Mantel-Haenszel estimation is employed to derive variance estimators for attributable fractions that are dually consistent, that is, consistent in both sparse data and large strata, and extensions to situations involving effect modification and preventive exposures are derived.
Abstract: A number of variance formulae for the attributable fraction have been presented, but none is consistent in sparse data, such as found in individually matched case-control studies. This paper employs Mantel-Haenszel estimation to derive variance estimators for attributable fractions that are dually consistent, that is, consistent in both sparse data and large strata. The method may also be applied using conditional maximum likelihood. Extensions of these estimators to situations involving effect modification and preventive exposures are also derived. Examples of applications to individually matched case-control studies are given.

Journal ArticleDOI
Kent R. Bailey1
TL;DR: In determining the role inter-study variation should play in an overview analysis, it is important to consider three factors: which question one is trying to answer; the degree of similarity or dissimilarity of design, and the degree to which heterogeneity of outcomes can be explained.
Abstract: In determining the role inter-study variation should play in an overview analysis, it is important to consider three factors: (1) which question one is trying to answer; (2) the degree of similarity or dissimilarity of design, and (3) the degree to which heterogeneity of outcomes can be explained. Three questions one might be interested in are: (1) whether treatment can be effective in some circumstances; (2) whether treatment is effective on average, and (3) whether treatment was effective on average in the trials at hand. Under the assumption of no qualitative interaction, the answers to these questions coincide. The O – E analysis most directly answers the third question. Other analyses are suggested when the first question is of interest, using the aspirin post-MI studies as an example.

Journal ArticleDOI
TL;DR: The power and required sample size are studied for Gaussian and log-Gaussian distributions of diagnostic test values and the results may be useful for the planning phase of studies to evaluate quantitative diagnostic tests.
Abstract: For a quantitative laboratory test the 0.975 fractile of the distribution of reference values is commonly used as a discrimination limit, and the sensitivity of the test is the proportion of diseased subjects with values exceeding this limit. A comparison of the estimates of sensitivity between two tests without taking into account the sampling variation of the discrimination limits can increase the type I error to about seven times the nominal value of 0.05. Correct statistical procedures are considered, and the power and required sample size are studied for Gaussian and log-Gaussian distributions of diagnostic test values. The results may be useful for the planning phase of studies to evaluate quantitative diagnostic tests.

Journal ArticleDOI
TL;DR: An approximation is given that is an asymptotic upper bound, easy to compute, and, for the purposes of hypothesis testing, more accurate than other approximations presented in the literature.
Abstract: The scan statistic evaluates whether an apparent cluster of disease in time is due to chance. The statistic employs a 'moving window' of length w and finds the maximum number of cases revealed through the window as it scans or slides over the entire time period T. Computation of the probability of observing a certain size cluster, under the hypothesis of a uniform distribution, is infeasible when N, the total number of events, is large, and w is of moderate or small size relative to T. We give an approximation that is an asymptotic upper bound, easy to compute, and, for the purposes of hypothesis testing, more accurate than other approximations presented in the literature. The approximation applies both when N is fixed, and when N has a Poisson distribution. We illustrate the procedure on a data set of trisomic spontaneous abortions observed in a two year period in New York City.

Journal ArticleDOI
TL;DR: It is shown that heterogeneity can seriously affect the treatment comparison and has to be considered during the planning stage as well as at the analysis of a clinical trial.
Abstract: We consider several sources of heterogeneity in a clinical trial with patients' survival time as the main response criterion: differences in prognosis which can be attributed to a latent or ignored prognostic factor; differences in treatment efficacy in subgroups of patients, and differences in treatment combinations received by the patients. The impact of these types of heterogeneity on the treatment comparison is studied assuming a proportional hazards model. It is measured by the size and power of the logrank and proportional hazards score tests and by the bias of the estimated treatment effect. We show that heterogeneity can seriously affect the treatment comparison and has to be considered during the planning stage as well as at the analysis of a clinical trial.

Journal ArticleDOI
TL;DR: There is no a priori reason to treat upper and lower confidence intervals in a symmetric fashion since censored survival data are by nature asymmetric, but the need for caution in the application of simulation studies to real problems is illustrated.
Abstract: We examine various methods to estimate the effective sample size for construction of confidence intervals for survival probabilities. We compare the effective sample sizes of Cutler and Ederer and Peto et al., as well as a modified Cutler-Ederer effective sample size. We investigate the use of these effective sample sizes in the common situation of many censored observations that intervene between the time point of interest and the last death before this time. We note that there is no a priori reason to treat upper and lower confidence intervals in a symmetric fashion since censored survival data are by nature asymmetric. We recommend the use of the Cutler-Ederer effective sample size in construction of upper confidence intervals and the Peto effective sample size in construction of lower confidence intervals. Two examples with real data demonstrate the differences between confidence intervals formed with different effective sample sizes. This study also illustrates the need for caution in the application of simulation studies to real problems.

Journal ArticleDOI
TL;DR: The decision process to terminate a trial early is complex and necessitates an accounting for many factors, and the Beta-Blocker Heart Attack Trial provides an excellent example of many of these issues.
Abstract: Monitoring interim accumulating data in a clinical trial for evidence of therapeutic benefit or toxicity is a frequent policy, usually carried out by an independent scientific committee. Repeated testing at conventional critical values can substantially inflate the type I error rate. To maintain acceptable levels, group sequential and stochastic curtailment have been developed for clinical trials. One should not view such methods as absolute rules, but as useful guides. The decision process to terminate a trial early is complex and necessitates an accounting for many factors. The Beta-Blocker Heart Attack Trial provides an excellent example of many of these issues.

Journal ArticleDOI
TL;DR: It is shown how the use of generalized least-squares estimators is equivalent to theUse of covariance adjustment and the results suggest that previously proposed simple covariance structures are unlikely to be appropriate in general.
Abstract: An account is given of the analysis of data from 2 x 2 cross-over trials which include baseline measurements. We show that most of the previously proposed methods can be incorporated into a general framework of least-squares estimation with a simple linear model. A simple analysis based on ordinary least-squares estimators is described which can be used with either two-sample t-tests and confidence intervals or with the corresponding non-parametric procedures. It is shown how the use of generalized least-squares estimators is equivalent to the use of covariance adjustment. These methods require no assumptions about the covariance structure of the measurements from each subject. The results of assessing the covariance structure present in examples of data from a number of trials are summarized. These results suggest that previously proposed simple covariance structures are unlikely to be appropriate in general.

Journal ArticleDOI
TL;DR: This paper describes a model for the survival of screen-detected cases, with a hazard function that depends on an individual's lead time, the duration of preclinical disease, and the time since diagnosis, that is, after the unknown date when clinical diagnosis would have occurred in the absence of screening.
Abstract: Early detection of cancer by screening advances the date of diagnosis, but may or may not affect survival. To assess the survival benefit associated with early detection, one must estimate the distribution of time survived post lead-time, that is, after the unknown date when clinical diagnosis would have occurred in the absence of screening. One can then compare the adjusted survival of screen-detected cancer cases to other groups of cases not diagnosed by screening. This paper describes a model for the survival of screen-detected cases, with a hazard function that depends on an individual's lead time, the duration of preclinical disease, and the time since diagnosis. The model is fitted to the ten year survival data from the 132 screen-detected cases of breast cancer in the well-known HIP (Health Insurance Plan of Greater New York) study. Comparison with the survival of several groups of cancer cases not detected by screening (interval cases arising clinically in persons previously screened, cases among persons who refuse screening, and cases among randomized controls not offered screening) yields various estimates of benefit. Use of the interval cases for comparison gives an estimate of about 21 breast cancer deaths prevented among 20,166 women screened in the HIP study; use of the data from the randomized controls gives an estimate of about 25 prevented deaths. The former estimate derives from the screened group of women only, and so the same method of evaluation may also be applied to community screening programmes and other situations that do not entail randomization.

Journal ArticleDOI
TL;DR: Some of the pitfalls that are encountered during randomized clinical trials are discussed, some solutions are outlined and the need for a conservative interpretation of the results is emphasized.
Abstract: Answers that have medical value can often be obtained from overviews of randomized clinical trials if care is taken in formulating a biologically sensible question and unbiased and careful methods are used in collecting, extracting and analysing the results. This article discusses some of the pitfalls that are encountered during this process, outlines some solutions and emphasizes the need for a conservative interpretation of the results.

Journal ArticleDOI
TL;DR: The findings are consistent with the 'cohort hypothesis' for the recent peculiar trend in Japanese male mortality, as the peculiarity of the cohort born in the early Showa Era was clearly detected by the curvature components of cohort effects for these major diseases.
Abstract: Reasons for the recent increase in mortality among Japanese men born between 1925 and 1940 are explored. "To elucidate which factors are responsible for these trends we analysed the mortality data quantitatively applying an age-period-cohort model modified so that period effects remain constant within certain age groups but may vary from one age group to the next." The results indicate that the increase in mortality from selected causes is due to cohort rather than period effects. (EXCERPT)

Journal ArticleDOI
TL;DR: The results indicate that valid information over a range of events can be obtained using postal surveys of mothers, and good agreement between the mothers and hospitals was obtained.
Abstract: The validity of information obtained from women about the medical aspects of their childbearing experiences has been assessed by comparing the responses of 223 mothers on a postal questionnaire with information extracted from their medical records. When discrepancies between the two sources were found, it was not assumed that the hospital records were correct. Instead, attempts were made to ascertain why the data might be inconsistent by re-checking the records and by contacting the mothers again. There appeared to be five main reasons for discrepancies: mothers' limited knowledge or understanding of certain procedures; problems of interpretation and definition; occasionally inaccurate or missing information in the medical records; under-reporting of sensitive information by mothers and, finally, questions which were misunderstood or misinterpreted by the mothers. For most items, however, good agreement between the mothers and hospitals was obtained, and the results indicate that valid information over a range of events can be obtained using postal surveys of mothers.

Journal ArticleDOI
TL;DR: A new method of analysing binary data from a three-treatment, three-period cross-over trial is described, based on a log-linear model and mirrors the analysis of continuous data.
Abstract: A new method of analysing binary data from a three-treatment, three-period cross-over trial is described. This method is based on a log-linear model and mirrors the analysis of continuous data. It is an extension of the method we introduced recently for the analysis of binary data from a two-treatment, two-period cross-over trial. We illustrate our method using data from a trial which compared two analgesics and a placebo for the relief of primary dysmenorrhea.

Journal ArticleDOI
TL;DR: In this discussion, some questions that can only be answered by examining a group of independent studies are outlined and some pitfalls that sometimes swamp the benefits the authors can gain from synthesis are discussed.
Abstract: When asking ‘what is known’ about a drug or therapy or program at any time, both researchers and practitioners often confront more than a single study. Facing a variety of findings, where conflicts may outweigh agreement, how can a reviewer constructively approach the task? In this discussion, I will outline some questions that can only be answered by examining a group of independent studies. I will also discuss some pitfalls that sometimes swamp the benefits we can gain from synthesis. Most of these pitfalls are avoidable if anticipated early in a review. The benefits of a quantitative review include information about how to match a treatment with the most promising recipients; increasing the statistical power to detect a significant new treatment; telling us when ‘contextual effects’ are important; helping us to assess the stability and robustness of treatment effectivenes; and informing us when research finds are especially sensitive to investigators' research design. The pitfalls include aggregating data from studies on different populations; aggregating when there is more than one underlying measure of central tendency; and emphasizing an average outcome when partitioning variance gives far more useful information.

Journal ArticleDOI
TL;DR: Overviews of clinical trials in the cardiovascular field have been critically reviewed and recommendations given based on lessons learned include the avoidance of three types of biases--publication bias, overviewer bias and investigator bias.
Abstract: Overviews of clinical trials in the cardiovascular field have been critically reviewed. Six reasons for the overviews were identified. An impression, at least from a scientific viewpoint, is that the pooled analyses have been valuable. Six potential problems are discussed and recommendations given based on lessons learned. These include the avoidance of three types of biases--publication bias, overviewer bias and investigator bias. The role of time-dependent treatment effects, the complex issue of 'mixing of apples and oranges' and the problem of errors are also addressed.