scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Rejoinder to Letter to the Editor "The Hazards of Period Specific and Weighted Hazard Ratios".

TL;DR: This document would like to thank the authors of the letter (Bartlett et al. 2020) for sharing their concerns regarding reporting treatment effect under nonproportional hazards (NPH), and they respect their views.
Abstract: We would like to thank the authors of the letter (Bartlett et al. 2020) for sharing their concerns regarding reporting treatment effect under nonproportional hazards (NPH), and we respect their pos...
Citations
More filters
Journal ArticleDOI
TL;DR: In this article, a log-rank test was used to evaluate the risk of nonproportional hazard in a clinical trial where NPH is a possibility and loss of power and clear description of treatment differences are key issues in designing and analyzing clinical trials.
Abstract: Loss of power and clear description of treatment differences are key issues in designing and analyzing a clinical trial where nonproportional hazard (NPH) is a possibility. A log-rank test may be i...

35 citations

Posted Content
TL;DR: Design and analysis considerations based on a combination test under different non-proportional hazard types and a straw man proposal for practitioners are provided.
Abstract: Loss of power and clear description of treatment differences are key issues in designing and analyzing a clinical trial where non-proportional hazard is a possibility. A log-rank test may be very inefficient and interpretation of the hazard ratio estimated using Cox regression is potentially problematic. In this case, the current ICH E9 (R1) addendum would suggest designing a trial with a clinically relevant estimand, e.g., expected life gain. This approach considers appropriate analysis methods for supporting the chosen estimand. However, such an approach is case specific and may suffer lack of power for important choices of the underlying alternate hypothesis distribution. On the other hand, there may be a desire to have robust power under different deviations from proportional hazards. Also, we would contend that no single number adequately describes treatment effect under non-proportional hazards scenarios. The cross-pharma working group has proposed a combination test to provide robust power under a variety of alternative hypotheses. These can be specified for primary analysis at the design stage and methods appropriately accounting for combination test correlations are efficient for a variety of scenarios. We have provided design and analysis considerations based on a combination test under different non-proportional hazard types and present a straw man proposal for practitioners. The proposals are illustrated with real life example and simulation.

33 citations

Journal ArticleDOI
TL;DR: In this paper, the authors describe methods to design a randomized oncology trial, calculate the sample size, analyze the trial data and obtain summary measures of the treatment effect in the presence of non-proportional hazards.
Abstract: In trials of novel immuno-oncology drugs, the proportional hazards (PH) assumption often does not hold for the primary time-to-event (TTE) efficacy endpoint, likely due to the unique mechanism of action of these drugs. In practice, when it is anticipated that PH may not hold for the TTE endpoint with respect to treatment, the sample size is often still calculated under the PH assumption, and the hazard ratio (HR) from the Cox model is still reported as the primary measure of the treatment effect. Sensitivity analyses of the TTE data using methods that are suitable under non-proportional hazards (non-PH) are commonly pre-planned. In cases where a substantial deviation from the PH assumption is likely, we suggest designing the trial, calculating the sample size and analyzing the data, using a suitable method that accounts for non-PH, after gaining alignment with regulatory authorities. In this comprehensive review article, we describe methods to design a randomized oncology trial, calculate the sample size, analyze the trial data and obtain summary measures of the treatment effect in the presence of non-PH. For each method, we provide examples of its use from the recent oncology trials literature. We also summarize in the Appendix some methods to conduct sensitivity analyses for overall survival (OS) when patients in a randomized trial switch or cross-over to the other treatment arm after disease progression on the initial treatment arm, and obtain an adjusted or weighted HR for OS in the presence of cross-over. This is an example of the treatment itself changing at a specific point in time - this cross-over may lead to a non-PH pattern of diminishing treatment effect.

6 citations

Journal ArticleDOI
TL;DR: In this paper , the Cox hazard ratio is not causally interpretable as a hazard ratio unless there is no treatment effect or an untestable and unrealistic assumption holds, and the authors provide more insight into the interpretation of hazard ratios and differences, investigating what can be learned about a treatment effect from the hazard ratio approaching unity after a certain period of time.
Abstract: This article surveys results concerning the interpretation of the Cox hazard ratio in connection to causality in a randomized study with a time-to-event response. The Cox model is assumed to be correctly specified, and we investigate whether the typical end product of such an analysis, the estimated hazard ratio, has a causal interpretation as a hazard ratio. It has been pointed out that this is not possible due to selection. We provide more insight into the interpretation of hazard ratios and differences, investigating what can be learned about a treatment effect from the hazard ratio approaching unity after a certain period of time. The conclusion is that the Cox hazard ratio is not causally interpretable as a hazard ratio unless there is no treatment effect or an untestable and unrealistic assumption holds. We give a hazard ratio that has a causal interpretation and study its relationship to the Cox hazard ratio.

6 citations

Journal ArticleDOI
Ian Bickle1
TL;DR: In this paper , the authors evaluated the MaxCombo test by reanalyzing data from six cancer clinical trials submitted to the U.S. Food and Drug Administration and found that interpretation of the test results is not clear when the Kaplan-Meier curves crossed or had early separation.
Abstract: The Cross-PhRMA working group has proposed using the MaxCombo test in place of the log-rank test to evaluate treatment differences based on time-to-event endpoints, particularly if nonproportional hazards are expected. Despite demonstrating improved power and overall Type I error control, concerns about using this test for inferential purposes remain. We evaluated the MaxCombo test by reanalyzing data from six cancer clinical trials submitted to the U.S. Food and Drug Administration. Interpretation of the MaxCombo test results is not clear when the Kaplan–Meier curves crossed or had early separation. In addition, we note difficulty in interpretation of the results from the MaxCombo test when the source cause of deviation from proportionality of the survival distributions is due to underlying factors such as differential treatment effect of subgroups or intercurrent events. We illustrate these concerns based on the case examples.

4 citations

References
More filters
Journal ArticleDOI
TL;DR: The authors describe a method and provide a simple worked example using inverse probability weights (IPW) to create adjusted survival curves when the weights are non-parametrically estimated, equivalent to direct standardization of the survival curves to the combined study population.

662 citations

Journal ArticleDOI
TL;DR: The log-rank test is most powerful under proportional hazards (PH) as mentioned in this paper, however, non-PH patterns are often observed in clinical trials, such as in immuno-oncology; therefore, alternative methods ar...
Abstract: The log-rank test is most powerful under proportional hazards (PH). In practice, non-PH patterns are often observed in clinical trials, such as in immuno-oncology; therefore, alternative methods ar...

58 citations

Journal ArticleDOI
TL;DR: In this paper, three categories of testing methods were evaluated, including weighted log-rank tests, Kaplan-Meier curve-based tests, and combination tests (including Breslow test, Lee's combo test, and MaxCombo test).
Abstract: The log-rank test is most powerful under proportional hazards (PH). In practice, non-PH patterns are often observed in clinical trials, such as in immuno-oncology; therefore, alternative methods are needed to restore the efficiency of statistical testing. Three categories of testing methods were evaluated, including weighted log-rank tests, Kaplan-Meier curve-based tests (including weighted Kaplan-Meier and Restricted Mean Survival Time, RMST), and combination tests (including Breslow test, Lee's combo test, and MaxCombo test). Nine scenarios representing the PH and various non-PH patterns were simulated. The power, type I error, and effect estimates of each method were compared. In general, all tests control type I error well. There is not a single most powerful test across all scenarios. In the absence of prior knowledge regarding the PH or non-PH patterns, the MaxCombo test is relatively robust across patterns. Since the treatment effect changes overtime under non-PH, the overall profile of the treatment effect may not be represented comprehensively based on a single measure. Thus, multiple measures of the treatment effect should be pre-specified as sensitivity analyses to evaluate the totality of the data.

55 citations

Journal ArticleDOI
TL;DR: It is demonstrated that under a rather mild condition on the censoring distribution, one can make inference about the RMST up to t, where t is less than or even equal to the largest follow-up time (either observed or censored) in the study.
Abstract: The t-year mean survival or restricted mean survival time (RMST) has been used as an appealing summary of the survival distribution within a time window [0, t]. RMST is the patient's life expectancy until time t and can be estimated nonparametrically by the area under the Kaplan-Meier curve up to t. In a comparative study, the difference or ratio of two RMSTs has been utilized to quantify the between-group-difference as a clinically interpretable alternative summary to the hazard ratio. The choice of the time window [0, t] may be prespecified at the design stage of the study based on clinical considerations. On the other hand, after the survival data have been collected, the choice of time point t could be data-dependent. The standard inferential procedures for the corresponding RMST, which is also data-dependent, ignore this subtle yet important issue. In this paper, we clarify how to make inference about a random "parameter." Moreover, we demonstrate that under a rather mild condition on the censoring distribution, one can make inference about the RMST up to t, where t is less than or even equal to the largest follow-up time (either observed or censored) in the study. This finding reduces the subjectivity of the choice of t empirically. The proposal is illustrated with the survival data from a primary biliary cirrhosis study, and its finite sample properties are investigated via an extensive simulation study.

49 citations

Journal ArticleDOI
TL;DR: The use of milestone survival is described as a potential efficacy endpoint for immune checkpoint inhibitors in late-stage drug development that could potentially mitigate the challenge of accelerating the drug development process when the strength of this class of agents is derived from long-term follow-up.
Abstract: Recent advancements in cancer immunotherapies offer diverse strategies for cancer treatment. Among the most promising approaches is the blockade of immune checkpoint molecules to activate antitumor immunity. With targeted immunotherapies of new mechanisms of action come greater challenges in study design and statistical analysis, as well as the need for refining clinical trial endpoints. The long-term survival and delayed clinical effects demonstrated by these therapies could result in substantial prolongation of study duration and loss of statistical power if these key attributes are not accounted for in the study design and statistical analyses. In the Brookings Conference on Clinical Cancer Research held in Washington, DC, in November 2013, several intermediate clinical endpoints, including milestone overall survival, were proposed for the evaluation of cancer immunotherapies to take into account the possibility of delayed treatment effect and to better characterize the clinical activity profile of such agents, particularly immune checkpoint inhibitors. In this manuscript, the use of milestone survival is described as a potential efficacy endpoint for immune checkpoint inhibitors in late-stage drug development that could potentially mitigate the challenge of accelerating the drug development process when the strength of this class of agents is derived from long-term follow-up.

48 citations


"Rejoinder to Letter to the Editor "..." refers methods in this paper

  • ...For potential prognostic factors, exploratory analysis can be performed to identify these factors and the adjusted analysis (e.g., stratified Cox model, inverse probability weighting (Cole and Hernan 2004)) can be conducted to reduce the bias....

    [...]