scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A flexible and coherent test/estimation procedure based on restricted mean survival times for censored time-to-event data in randomized clinical trials.

10 Jul 2018-Statistics in Medicine (John Wiley & Sons, Ltd)-Vol. 37, Iss: 15, pp 2307-2320
TL;DR: The proposed procedure is composed of a prespecified test and an estimation of corresponding robust and interpretable quantitative treatment effect, which is dramatically more powerful than the logrank, Wilcoxon tests, and the restricted mean survival time-based test with a fixed τ, for the patterns of difference seen in these cancer clinical trials.
Abstract: In randomized clinical trials where time-to-event is the primary outcome, almost routinely, the logrank test is prespecified as the primary test and the hazard ratio is used to quantify treatment effect. If the ratio of 2 hazard functions is not constant, the logrank test is not optimal and the interpretation of hazard ratio is not obvious. When such a nonproportional hazards case is expected at the design stage, the conventional practice is to prespecify another member of weighted logrank tests, eg, Peto-Prentice-Wilcoxon test. Alternatively, one may specify a robust test as the primary test, which can capture various patterns of difference between 2 event time distributions. However, most of those tests do not have companion procedures to quantify the treatment difference, and investigators have fallen back on reporting treatment effect estimates not associated with the primary test. Such incoherence in the "test/estimation" procedure may potentially mislead clinicians/patients who have to balance risk-benefit for treatment decision. To address this, we propose a flexible and coherent test/estimation procedure based on restricted mean survival time, where the truncation time τ is selected data dependently. The proposed procedure is composed of a prespecified test and an estimation of corresponding robust and interpretable quantitative treatment effect. The utility of the new procedure is demonstrated by numerical studies based on 2 randomized cancer clinical trials; the test is dramatically more powerful than the logrank, Wilcoxon tests, and the restricted mean survival time-based test with a fixed τ, for the patterns of difference seen in these cancer clinical trials.
Citations
More filters
Journal ArticleDOI
TL;DR: A caution is offered about some of these methods’ limitations in translating statistical evidence into clinical evidence, both for formal treatment-effect hypothesis testing and for estimation, when used for the primary analysis.
Abstract: Evaluation of new anticancer therapies in randomized clinical trials (RCTs) is typically based on comparing a new treatment with a standard one, using a time-to-event end point such as overall survival or progression-free survival (PFS). Although the statistical framework underlying the design of these RCTs is centered on formal testing of a treatment effect, methods for estimation (quantification) of the treatment benefit are also specified. Currently, log-rank statistical tests and/or proportional hazards models are commonly used for the trial design and primary analysis. These methods are optimized for treatment effects that do not change substantially over time (the proportional hazard assumption). Introduction of immunotherapeutic agents with potentially delayed treatment effects has renewed interest in statistical methods that can better accommodate general departures from proportional hazards and, particularly, a delayed treatment effect. This has led to considerable attention in, and some controversy about, appropriate statistical methodology for comparing survival curves, as demonstrated by the comments and replies on trial reports1-24 and at a Duke–US Food and Drug Administration workshop25 that offered alternatives to the standard log-rank/hazard-ratio methodology. While these new methods could be useful, as outlined in comprehensive reviews,26-30 we offer a caution about some of these methods’ limitations in translating statistical evidence into clinical evidence, both for formal treatment-effect hypothesis testing and for estimation (when used for the primary analysis).

58 citations

Journal ArticleDOI
TL;DR: Alternative statistical methods, without proportional hazards assumptions, should be considered in the design and analysis of clinical trials when the likelihood of DPHs is high and was more common for immunotherapy trials and non-OS endpoints.
Abstract: Purpose: Deviations from proportional hazards (DPHs), which may be more prevalent in the era of precision medicine and immunotherapy, can lead to underpowered trials or misleading conclusions. We used a meta-analytic approach to estimate DPHs across cancer trials, investigate associated factors, and evaluate data-analysis approaches for future trials. Experimental Design: We searched PubMed for phase III trials in breast, lung, prostate, and colorectal cancer published in a preselected list of journals between 2014 and 2016 and extracted individual patient-level data (IPLD) from Kaplan–Meier curves. We re-analyzed IPLD to identify DPHs. Potential efficiency gains, when DPHs were present, of alternative statistical methods relative to standard log-rank based analysis were expressed as sample-size requirements for a fixed power level. Results: From 152 trials, we obtained IPLD on 129,401 patients. Among 304 Kaplan–Meier figures, 75 (24.7%) exhibited evidence of DPHs, including eight of 14 (57%) KM pairs from immunotherapy trials. Trial type [immunotherapy, odds ratio (OR), 4.29; 95% confidence interval (CI), 1.11–16.6], metastatic patient population (OR, 3.18; 95% CI, 1.26–8.05), and non-OS endpoints (OR, 3.23; 95% CI, 1.79–5.88) were associated with DPHs. In immunotherapy trials, alternative statistical approaches allowed for more efficient clinical trials with fewer patients (up to 74% reduction) relative to log-rank testing. Conclusions: DPHs were found in a notable proportion of time-to-event outcomes in published clinical trials in oncology and was more common for immunotherapy trials and non-OS endpoints. Alternative statistical methods, without proportional hazards assumptions, should be considered in the design and analysis of clinical trials when the likelihood of DPHs is high.

47 citations


Cites background from "A flexible and coherent test/estima..."

  • ...Metastatic or recurrent patients allowed 55/202 (27) 1....

    [...]

  • ...Horiguchi and colleagues (27) developed an extension of RMST that adaptively selects the truncation time t ....

    [...]

Journal ArticleDOI
TL;DR: This work describes alternative measures in two data settings, the overall survival setting and the relative survival setting, and illustrates their use analyzing England population-based registry data of men 15–80 years old diagnosed with colon cancer in 2001–2003, aiming to describe the deprivation disparities in survival.
Abstract: Survival data analysis results are usually communicated through the overall survival probability. Alternative measures provide additional insights and may help in communicating the results to a wider audience. We describe these alternative measures in two data settings, the overall survival setting and the relative survival setting, the latter corresponding to the particular competing risk setting in which the cause of death is unavailable or unreliable. In the overall survival setting, we describe the overall survival probability, the conditional survival probability and the restricted mean survival time (restricted to a prespecified time window). In the relative survival setting, we describe the net survival probability, the conditional net survival probability, the restricted mean net survival time, the crude probability of death due to each cause and the number of life years lost due to each cause over a prespecified time window. These measures describe survival data either on a probability scale or on a timescale. The clinical or population health purpose of each measure is detailed, and their advantages and drawbacks are discussed. We then illustrate their use analyzing England population-based registry data of men 15-80 years old diagnosed with colon cancer in 2001-2003, aiming to describe the deprivation disparities in survival. We believe that both the provision of a detailed example of the interpretation of each measure and the software implementation will help in generalizing their use.

43 citations


Cites methods from "A flexible and coherent test/estima..."

  • ...Other authors derived statistical tests and procedures when comparing the RMST in the context of clinical trials.(56,57) Measures based on NS are defined in a hypothetical world where patients could only die from their disease....

    [...]

Journal ArticleDOI
TL;DR: The restricted mean survival time is a survival endpoint that is meaningful to investigators and to patients and at the same time requires less restrictive assumptions.
Abstract: Background/aims:The difference in mean survival time, which quantifies the treatment effect in terms most meaningful to patients and retains its interpretability regardless of the shape of the surv...

27 citations

Journal ArticleDOI
TL;DR: The results indicate that the proposed method has a very good and stable performance to estimate the individual treatment effects and can detect a treatment effect in a sub-population even when the overall effect is small or nonexistent.
Abstract: Motivation Personalized medicine often relies on accurate estimation of a treatment effect for specific subjects. This estimation can be based on the subject's baseline covariates but additional complications arise for a time-to-event response subject to censoring. In this paper, the treatment effect is measured as the difference between the mean survival time of a treated subject and the mean survival time of a control subject. We propose a new random forest method for estimating the individual treatment effect with survival data. The random forest is formed by individual trees built with a splitting rule specifically designed to partition the data according to the individual treatment effect. For a new subject, the forest provides a set of similar subjects from the training dataset that can be used to compute an estimation of the individual treatment effect with any adequate method. Results The merits of the proposed method are investigated with a simulation study where it is compared to numerous competitors, including recent state-of-the-art methods. The results indicate that the proposed method has a very good and stable performance to estimate the individual treatment effects. Two examples of application with a colon cancer data and breast cancer data show that the proposed method can detect a treatment effect in a sub-population even when the overall effect is small or nonexistent. Availability and implementation The authors are working on an R package implementing the proposed method and it will be available soon. In the meantime, the code can be obtained from the first author at sami.tabib@hec.ca. Supplementary information Supplementary data are available at Bioinformatics online.

16 citations

References
More filters
Book
14 Mar 1996
TL;DR: In this article, the authors define the Ball Sigma-Field and Measurability of Suprema and show that it is possible to achieve convergence almost surely and in probability.
Abstract: 1.1. Introduction.- 1.2. Outer Integrals and Measurable Majorants.- 1.3. Weak Convergence.- 1.4. Product Spaces.- 1.5. Spaces of Bounded Functions.- 1.6. Spaces of Locally Bounded Functions.- 1.7. The Ball Sigma-Field and Measurability of Suprema.- 1.8. Hilbert Spaces.- 1.9. Convergence: Almost Surely and in Probability.- 1.10. Convergence: Weak, Almost Uniform, and in Probability.- 1.11. Refinements.- 1.12. Uniformity and Metrization.- 2.1. Introduction.- 2.2. Maximal Inequalities and Covering Numbers.- 2.3. Symmetrization and Measurability.- 2.4. Glivenko-Cantelli Theorems.- 2.5. Donsker Theorems.- 2.6. Uniform Entropy Numbers.- 2.7. Bracketing Numbers.- 2.8. Uniformity in the Underlying Distribution.- 2.9. Multiplier Central Limit Theorems.- 2.10. Permanence of the Donsker Property.- 2.11. The Central Limit Theorem for Processes.- 2.12. Partial-Sum Processes.- 2.13. Other Donsker Classes.- 2.14. Tail Bounds.- 3.1. Introduction.- 3.2. M-Estimators.- 3.3. Z-Estimators.- 3.4. Rates of Convergence.- 3.5. Random Sample Size, Poissonization and Kac Processes.- 3.6. The Bootstrap.- 3.7. The Two-Sample Problem.- 3.8. Independence Empirical Processes.- 3.9. The Delta-Method.- 3.10. Contiguity.- 3.11. Convolution and Minimax Theorems.- A. Appendix.- A.1. Inequalities.- A.2. Gaussian Processes.- A.2.1. Inequalities and Gaussian Comparison.- A.2.2. Exponential Bounds.- A.2.3. Majorizing Measures.- A.2.4. Further Results.- A.3. Rademacher Processes.- A.4. Isoperimetric Inequalities for Product Measures.- A.5. Some Limit Theorems.- A.6. More Inequalities.- A.6.1. Binomial Random Variables.- A.6.2. Multinomial Random Vectors.- A.6.3. Rademacher Sums.- Notes.- References.- Author Index.- List of Symbols.

5,231 citations

BookDOI
TL;DR: This chapter discusses Convergence: Weak, Almost Uniform, and in Probability, which focuses on the part of Convergence of the Donsker Property which is concerned with Uniformity and Metrization.
Abstract: 1.1. Introduction.- 1.2. Outer Integrals and Measurable Majorants.- 1.3. Weak Convergence.- 1.4. Product Spaces.- 1.5. Spaces of Bounded Functions.- 1.6. Spaces of Locally Bounded Functions.- 1.7. The Ball Sigma-Field and Measurability of Suprema.- 1.8. Hilbert Spaces.- 1.9. Convergence: Almost Surely and in Probability.- 1.10. Convergence: Weak, Almost Uniform, and in Probability.- 1.11. Refinements.- 1.12. Uniformity and Metrization.- 2.1. Introduction.- 2.2. Maximal Inequalities and Covering Numbers.- 2.3. Symmetrization and Measurability.- 2.4. Glivenko-Cantelli Theorems.- 2.5. Donsker Theorems.- 2.6. Uniform Entropy Numbers.- 2.7. Bracketing Numbers.- 2.8. Uniformity in the Underlying Distribution.- 2.9. Multiplier Central Limit Theorems.- 2.10. Permanence of the Donsker Property.- 2.11. The Central Limit Theorem for Processes.- 2.12. Partial-Sum Processes.- 2.13. Other Donsker Classes.- 2.14. Tail Bounds.- 3.1. Introduction.- 3.2. M-Estimators.- 3.3. Z-Estimators.- 3.4. Rates of Convergence.- 3.5. Random Sample Size, Poissonization and Kac Processes.- 3.6. The Bootstrap.- 3.7. The Two-Sample Problem.- 3.8. Independence Empirical Processes.- 3.9. The Delta-Method.- 3.10. Contiguity.- 3.11. Convolution and Minimax Theorems.- A. Appendix.- A.1. Inequalities.- A.2. Gaussian Processes.- A.2.1. Inequalities and Gaussian Comparison.- A.2.2. Exponential Bounds.- A.2.3. Majorizing Measures.- A.2.4. Further Results.- A.3. Rademacher Processes.- A.4. Isoperimetric Inequalities for Product Measures.- A.5. Some Limit Theorems.- A.6. More Inequalities.- A.6.1. Binomial Random Variables.- A.6.2. Multinomial Random Vectors.- A.6.3. Rademacher Sums.- Notes.- References.- Author Index.- List of Symbols.

4,600 citations


Additional excerpts

  • ...We use a perturbation resampling (or a wild bootstrap) procedure to obtain the distribution of Z under the null hypothesis.(33) Specifically, we numerically approximate the distribution of Zðτ1Þ; ⋯;ZðτKÞ f g′ 1⁄4 ffiffiffi n p ðD̂ðτ1Þ−Dðτ1ÞÞ=σ̂ðτ1Þ; ⋯; ffiffiffi n p ðD̂ðτKÞ−DðτKÞÞ=σ̂ðτKÞ ′ (3)...

    [...]

Journal ArticleDOI
TL;DR: The American Statistical Association (ASA) released a policy statement on p-values and statistical significance in 2015 as discussed by the authors, which was based on a discussion with the ASA Board of Trustees and concerned with reproducibility and replicability of scientific conclusions.
Abstract: Cobb’s concern was a long-worrisome circularity in the sociology of science based on the use of bright lines such as p< 0.05: “We teach it because it’s what we do; we do it because it’s what we teach.” This concern was brought to the attention of the ASA Board. The ASA Board was also stimulated by highly visible discussions over the last few years. For example, ScienceNews (Siegfried 2010) wrote: “It’s science’s dirtiest secret: The ‘scientific method’ of testing hypotheses by statistical analysis stands on a flimsy foundation.” A November 2013, article in Phys.org Science News Wire (2013) cited “numerous deep flaws” in null hypothesis significance testing. A ScienceNews article (Siegfried 2014) on February 7, 2014, said “statistical techniques for testing hypotheses...havemore flaws than Facebook’s privacy policies.” Aweek later, statistician and “Simply Statistics” blogger Jeff Leek responded. “The problem is not that people use P-values poorly,” Leek wrote, “it is that the vast majority of data analysis is not performed by people properly trained to perform data analysis” (Leek 2014). That same week, statistician and science writer Regina Nuzzo published an article in Nature entitled “Scientific Method: Statistical Errors” (Nuzzo 2014). That article is nowone of the most highly viewedNature articles, as reported by altmetric.com (http://www.altmetric.com/details/2115792#score). Of course, it was not simply a matter of responding to some articles in print. The statistical community has been deeply concerned about issues of reproducibility and replicability of scientific conclusions. Without getting into definitions and distinctions of these terms, we observe that much confusion and even doubt about the validity of science is arising. Such doubt can lead to radical choices, such as the one taken by the editors of Basic andApplied Social Psychology, who decided to ban p-values (null hypothesis significance testing) (Trafimow and Marks 2015). Misunderstanding or misuse of statistical inference is only one cause of the “reproducibility crisis” (Peng 2015), but to our community, it is an important one. When the ASA Board decided to take up the challenge of developing a policy statement on p-values and statistical significance, it did so recognizing this was not a lightly taken step. The ASA has not previously taken positions on specific matters of statistical practice. The closest the association has come to this is a statement on the use of value-added models (VAM) for educational assessment (Morganstein and Wasserstein 2014) and a statement on risk-limiting post-election audits (American Statistical Association 2010). However, these were truly policy-related statements. The VAM statement addressed a key educational policy issue, acknowledging the complexity of the issues involved, citing limitations of VAMs as effective performance models, and urging that they be developed and interpreted with the involvement of statisticians. The statement on election auditing was also in response to a major but specific policy issue (close elections in 2008), and said that statistically based election audits should become a routine part of election processes. By contrast, the Board envisioned that the ASA statement on p-values and statistical significance would shed light on an aspect of our field that is too often misunderstood and misused in the broader research community, and, in the process, provides the community a service. The intended audience would be researchers, practitioners, and science writers who are not primarily statisticians. Thus, this statementwould be quite different from anything previously attempted. The Board tasked Wasserstein with assembling a group of experts representing a wide variety of points of view. On behalf of the Board, he reached out to more than two dozen such people, all of whom said theywould be happy to be involved. Several expressed doubt about whether agreement could be reached, but those who did said, in effect, that if there was going to be a discussion, they wanted to be involved. Over the course of many months, group members discussed what format the statement should take, tried to more concretely visualize the audience for the statement, and began to find points of agreement. That turned out to be relatively easy to do, but it was just as easy to find points of intense disagreement. The time came for the group to sit down together to hash out these points, and so in October 2015, 20 members of the group met at the ASA Office in Alexandria, Virginia. The 2-day meeting was facilitated by Regina Nuzzo, and by the end of the meeting, a good set of points around which the statement could be built was developed. The next 3 months saw multiple drafts of the statement, reviewed by group members, by Board members (in a lengthy discussion at the November 2015 ASA Board meeting), and by members of the target audience. Finally, on January 29, 2016, the Executive Committee of the ASA approved the statement. The statement development process was lengthier and more controversial than anticipated. For example, there was considerable discussion about how best to address the issue of multiple potential comparisons (Gelman and Loken 2014). We debated at some length the issues behind the words “a p-value near 0.05 taken by itself offers only weak evidence against the null

4,361 citations


"A flexible and coherent test/estima..." refers background in this paper

  • ...Unfortunately, even if the P value is quite small, sole P values do not provide enough risk‐benefit information for decision making.(32) We need a robust, interpretable quantitative summary of the treatment effect that corresponds to the prespecified primary test....

    [...]

Journal ArticleDOI
TL;DR: The combination of ramucirumab with pac litaxel significantly increases overall survival compared with placebo plus paclitaxel, and could be regarded as a new standard second-line treatment for patients with advanced gastric cancer.
Abstract: Summary Background VEGFR-2 has a role in gastric cancer pathogenesis and progression. We assessed whether ramucirumab, a monoclonal antibody VEGFR-2 antagonist, in combination with paclitaxel would increase overall survival in patients previously treated for advanced gastric cancer compared with placebo plus paclitaxel. Methods This randomised, placebo-controlled, double-blind, phase 3 trial was done at 170 centres in 27 countries in North and South America, Europe, Asia, and Australia. Patients aged 18 years or older with advanced gastric or gastro-oesophageal junction adenocarcinoma and disease progression on or within 4 months after first-line chemotherapy (platinum plus fluoropyrimidine with or without an anthracycline) were randomly assigned with a centralised interactive voice or web-response system in a 1:1 ratio to receive ramucirumab 8 mg/kg or placebo intravenously on days 1 and 15, plus paclitaxel 80 mg/m 2 intravenously on days 1, 8, and 15 of a 28-day cycle. A permuted block randomisation, stratified by geographic region, time to progression on first-line therapy, and disease measurability, was used. The primary endpoint was overall survival. Efficacy analysis was by intention to treat, and safety analysis included all patients who received at least one treatment with study drug. This trial is registered with ClinicalTrials.gov, number NCT01170663, and has been completed; patients who are still receiving treatment are in the extension phase. Findings Between Dec 23, 2010, and Sept 23, 2012, 665 patients were randomly assigned to treatment—330 to ramucirumab plus paclitaxel and 335 to placebo plus paclitaxel. Overall survival was significantly longer in the ramucirumab plus paclitaxel group than in the placebo plus paclitaxel group (median 9·6 months [95% CI 8·5–10·8] vs 7·4 months [95% CI 6·3–8·4], hazard ratio 0·807 [95% CI 0·678–0·962]; p=0·017). Grade 3 or higher adverse events that occurred in more than 5% of patients in the ramucirumab plus paclitaxel group versus placebo plus paclitaxel included neutropenia (133 [41%] of 327 vs 62 [19%] of 329), leucopenia (57 [17%] vs 22 [7%]), hypertension (46 [14%] vs eight [2%]), fatigue (39 [12%] vs 18 [5%]), anaemia (30 [9%] vs 34 [10%]), and abdominal pain (20 [6%] vs 11 [3%]). The incidence of grade 3 or higher febrile neutropenia was low in both groups (ten [3%] vs eight [2%]). Interpretation The combination of ramucirumab with paclitaxel significantly increases overall survival compared with placebo plus paclitaxel, and could be regarded as a new standard second-line treatment for patients with advanced gastric cancer. Funding Eli Lilly and Company.

1,778 citations


"A flexible and coherent test/estima..." refers result in this paper

  • ...The result was reported by Wilke and colleagues.(35) This study also used the same conventional logrank/HR test/estimation approach for design and analysis as was used in the previous example....

    [...]