DOI: 10.3414/ME17-01-0058

# The Average Hazard Ratio – A Good Effect Measure for Time-to-event Endpoints when the Proportional Hazard Assumption is Violated?

This work was supported by the German Research Foundation (grant number RA 237/1–2).### Publication History

received:
10 June 2017

accepted:
18 September 2017

Publication Date:

02 May 2018 (online)

### Summary

Background: In many clinical trial applications, the endpoint of interest corresponds to a time-to-event endpoint. In this case, group differences are usually expressed by the hazard ratio. Group differences are commonly assessed by the logrank test, which is optimal under the proportional hazard assumption. However, there are many situations in which this assumption is violated. Especially in applications were a full population and several subgroups or a composite time-to-first-event endpoint and several components are considered, the proportional hazard assumption usually does not simultaneously hold true for all test problems under investigation. As an alternative effect measure, Kalbfleisch and Prentice proposed the so-called ‘average hazard ratio’. The average hazard ratio is based on a flexible weighting function to modify the influence of time and has a meaningful interpretation even in the case of non-proportional hazards. Despite this favorable property, it is hardly ever used in practice, whereas the standard hazard ratio is commonly reported in clinical trials regardless of whether the proportional hazard assumption holds true or not.

Objectives: There exist two main approaches to construct corresponding estimators and tests for the average hazard ratio where the first relies on weighted Cox regression and the second on a simple plug-in estimator. The aim of this work is to give a systematic comparison of these two approaches and the standard logrank test for different time-toevent settings with proportional and nonproportional hazards and to illustrate the pros and cons in application.

Methods: We conduct a systematic comparative study based on Monte-Carlo simulations and by a real clinical trial example.

Results: Our results suggest that the properties of the average hazard ratio depend on the underlying weighting function. The two approaches to construct estimators and related tests show very similar performance for adequately chosen weights. In general, the average hazard ratio defines a more valid effect measure than the standard hazard ratio under non-proportional hazards and the corresponding tests provide a power advantage over the common logrank test.

Conclusions: As non-proportional hazards are often met in clinical practice and the average hazard ratio tests often outperform the common logrank test, this approach should be used more routinely in applications.

### References

**1**Schoenfeld D. The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika 1981; 68: 316-319.**2**Tarone E, Ware J. On distribution-free tests for equality of survival distributions. Biometrika 1977; 64: 156-160.**3**Rauch G, Beyersmann J. Planning and evaluating clinical trials with composite time-to-first-event endpoints in a competing risk framework. Stat Med 2013; 32: 595-3608.**4**Dunkler D, Schemper M, Heinze G. Gene selection in microarray survival studies under possibly non-proportional hazards. Bioinformatics 2010; 26: 784-790.**5**Xu R, O’Quigley J. Estimating average regression effect under non-proportional hazards. Biostatistics 2000; 01: 423-439.**6**Struthers A, Kalbfleisch JD. Misspecified proportional hazard models. Biometrika 1986; 73: 363-369.**7**Van Houwelingen HC, van de Velde CJ, Stijnen T. Interim analysis on survival data: its potential bias and how to repair it. Stat Med 2005; 24: 2823-2835.**8**Kalbfleisch JD, Prentice RL. Estimation of the average hazard ratio. Biometrika 1981; 68: 105-112.**9**Stare J, Maucort-Boulch D. Odds ratio, hazard ratio and relative risk. Metodoloski Zvezki 2016; 13: 59-67.**10**Schemper M. Cox analysis of survival data with non-proportional hazard functions. J R Stat Soc Ser A Stat Soc – Series D 1992; 41: 455-465.**11**Schemper M, Wakouning S, Heinze G. The estimation of average hazard ratios by weighted Cox regression. Stat Med 2009; 28: 2473-2489.**12**Wakounig S, Heinze G, Schemper M. Non-parametric estimation of relative risk in survival and associated tests. Stat Methods Med Res 2015; 24: 856-870.**13**Koziol A, Jia Z. The concordance index C and the Mann-Whitney parameter P(X>Y) with randomly censored data. Am Stat 2001; 55: 207-210.**14**Satten G, Datta S. The Kaplan-Meier estimator as an inverse-probability-of censoring weighted average. Biom J 2009; 51: 467-474.**15**Brückner M, Brannath W. Sequential tests for nonproportional hazards data. Lifetime Data Anal 2016; 1-14.**16**R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing; Vienna, Austria: 2008. [cited 2017 Oct 20]. Available from: http://www.R-project.org**17**Brückner M. AHR: Estimation and testing of average hazard ratios. R Package Version 1.4.2, 2016 [cited 2017 Oct 20]. Available from: http://CRAN.R-project.org/package=AHR**18**Ploner M, Heinze G, Dunkler D, Heinze G. coxphw. R Package Version 4.0.0., 2017 [cited 2017 Oct 20]. Available from: http://CRAN.R-project.org/package=coxphw**19**Chow S, Shao J, Wang H. Sample Size Calculations in Clinical Research. Boca Raton: Chapman & Hall; 2008**20**Jung S, Kang S, McCall L, Blumenstein B. Sample size computation for two-sample noninferiority logrank test. J Biopharm Statis 2005; 15: 969-979.**21**Kejžar N, Maucort-Boulch D, Stare J. A note on bias of measures of explained variation for survival data. Stat Med 2016; 35: 877-882.**22**Royston P, Parmar MK. The use of restricted mean survival time to estimate the treatment effect in randomized clinical trials when the proportional hazards assumption is in doubt. Stat Med 2011; 30: 2409-2421.**23**Lin DY, Wei LJ. The robust inference for the Cox proportional hazards model. J Am Stat Assoc 1989; 84: 1074-1078.**24**Sasieni P. Maximum weighted partial likelihood estimators for the Cox model. J Am Stat Assoc 1993; 88: 144-152.**25**Schoenfeld D. Sample-size formula for the proportional-hazards regression model. Biometrics 1983; 499-503.**26**Benedet JL, Bender H, Jones H, Ngan HYS, Pecorelli S. FIGO Committee on Gynecological Oncology. FIGO staging classifications and clinical practice guidelines in the management of gynecological cancers. Int J Gynaecol Oncol 2000; 70: 209-262.**27**Wassmer G. Planning and analyzing adaptive group sequential survival trials. Biom J 2006; 48: 714-729.**28**Bilias Y, Gu M, Ying Z. Towards a general asymptotic theory for Cox model with staggered entry. Ann Stat 1997; 25: 662-682.**29**Sellke T, Siegmund D. Sequential analysis of the proportional hazards model. Biometrika 1983; 70: 315-326.