Methods Inf Med 2018; 57(03): 089-100
DOI: 10.3414/ME17-01-0058
Original Article
Schattauer GmbH

The Average Hazard Ratio – A Good Effect Measure for Time-to-event Endpoints when the Proportional Hazard Assumption is Violated?

Geraldine Rauch
Institute of Medical Biometry and Informatics, University of Heidelberg, Germany
Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of Biometry and Clinical Epidemiology, Berlin, Germany
Werner Brannath
Competence Center for Clinical Trials, University of Bremen, Germany
Matthias Brückner
Competence Center for Clinical Trials, University of Bremen, Germany
Department of Mathematics and Statistics, University of Lancaster, UK
Meinhard Kieser
Institute of Medical Biometry and Informatics, University of Heidelberg, Germany
› Author Affiliations
This work was supported by the German Research Foundation (grant number RA 237/1–2).
Further Information

Publication History

received: 10 June 2017

accepted: 18 September 2017

Publication Date:
02 May 2018 (online)


Background: In many clinical trial applications, the endpoint of interest corresponds to a time-to-event endpoint. In this case, group differences are usually expressed by the hazard ratio. Group differences are commonly assessed by the logrank test, which is optimal under the proportional hazard assumption. However, there are many situations in which this assumption is violated. Especially in applications were a full population and several subgroups or a composite time-to-first-event endpoint and several components are considered, the proportional hazard assumption usually does not simultaneously hold true for all test problems under investigation. As an alternative effect measure, Kalbfleisch and Prentice proposed the so-called ‘average hazard ratio’. The average hazard ratio is based on a flexible weighting function to modify the influence of time and has a meaningful interpretation even in the case of non-proportional hazards. Despite this favorable property, it is hardly ever used in practice, whereas the standard hazard ratio is commonly reported in clinical trials regardless of whether the proportional hazard assumption holds true or not.

Objectives: There exist two main approaches to construct corresponding estimators and tests for the average hazard ratio where the first relies on weighted Cox regression and the second on a simple plug-in estimator. The aim of this work is to give a systematic comparison of these two approaches and the standard logrank test for different time-toevent settings with proportional and nonproportional hazards and to illustrate the pros and cons in application.

Methods: We conduct a systematic comparative study based on Monte-Carlo simulations and by a real clinical trial example.

Results: Our results suggest that the properties of the average hazard ratio depend on the underlying weighting function. The two approaches to construct estimators and related tests show very similar performance for adequately chosen weights. In general, the average hazard ratio defines a more valid effect measure than the standard hazard ratio under non-proportional hazards and the corresponding tests provide a power advantage over the common logrank test.

Conclusions: As non-proportional hazards are often met in clinical practice and the average hazard ratio tests often outperform the common logrank test, this approach should be used more routinely in applications.