Methods Inf Med 2006; 45(05): 548-556
DOI: 10.1055/s-0038-1634117
Original Article
Schattauer GmbH

Association between Split Selection Instability and Predictive Error in Survival Trees

M. Radespiel-Tröger
1   Department of Medical Informatics, Biometry, and Epidemiology, Friedrich-Alexander-University, Erlangen, Germany
,
O. Gefeller
1   Department of Medical Informatics, Biometry, and Epidemiology, Friedrich-Alexander-University, Erlangen, Germany
,
T. Rabenstein
2   Department of Medicine I, Friedrich-Alexander-University, Erlangen, Germany
,
T. Hothorn
1   Department of Medical Informatics, Biometry, and Epidemiology, Friedrich-Alexander-University, Erlangen, Germany
› Author Affiliations
Further Information

Publication History

Received: 01 March 2005

accepted: 13 December 2005

Publication Date:
07 February 2018 (online)

Summary

Objectives: To evaluate split selection instability in six survival tree algorithms and its relationship with predictive error by means of a bootstrap study.

Methods: We study the following algorithms: logrank statistic with multivariate p-value adjustment without pruning (LR), Kaplan-Meier distance of survival curves (KM), martingale residuals (MR), Poisson regression for censored data (PR), within-node impurity (WI), and exponential log-likelihood loss (XL). With the exception of LR, initial trees are pruned by using split-complexity, and final trees are selected by means of cross-validation. We employ a real dataset from a clinical study of patients with gallbladder stones. The predictive error is evaluated using the integrated Brier score for censored data. The relationship between split selection instability and predictive error is evaluated by means of box-percentile plots, covariate and cutpoint selection entropy, and cutpoint selection coefficients of variation, respectively, in the root node.

Results: We found a positive association between covariate selection instability and predictive error in the root node. LR yields the lowest predictive error, while KM and MR yield the highest predictive error.

Conclusions: The predictive error of survival trees is related to split selection instability. Based on the low predictive error of LR, we recommend the use of this algorithm for the construction of survival trees. Unpruned survival trees with multivariate p-value adjustment can perform equally well compared to pruned trees. The analysis of split selection instability can be used to communicate the results of tree-based analyses to clinicians and to support the application of survival trees.

 
  • References

  • 1 Schumacher M, Graf E, Gerds T. How to assess prognostic models for survival data: a case study in oncology. Methods Inf Med 2003; 42: 564-71.
  • 2 Schwarzer G, Nagata T, Mattern D, Schmelzeisen R, Schumacher M. Comparison of fuzzy inference, logistic regression, and classification trees (CART). Prediction of cervical lymph node metastasis in carcinoma of the tongue. Methods Inf Med 2003; 42: 572-7.
  • 3 Breiman L, Friedman J, Olshen RA, Stone CJ. Classification and Regression Trees. Monterey: Wadsworth; 1984
  • 4 Gordon L, Olshen RA. Tree-structured survival analysis. Cancer Treat Rep 1985; 69: 1065-9.
  • 5 Ciampi A, Thiffault J, Nakache JP, Asselain B.. Stratification by stepwise regression, correspondence analysis and recursive partition: A comparison of three methods of analysis for survival data with covariates, Comput Stat Data An. 1986; 4: 185-204.
  • 6 Segal MR. Regression Trees for Censored Data. Biometrics 1988; 44: 35-47.
  • 7 Davis RB, Anderson JR. Exponential Survival Trees. Stat Med 1989; 8: 947-61.
  • 8 Therneau TM, Grambsch P, Fleming TR. Martingale- based Residuals for survival models. Biometrika 1990; 77: 147-60.
  • 9 LeBlanc M, Crowley J. Relative risk trees for censored survival data. Biometrics 1992; 48: 411-25.
  • 10 Lausen B, Sauerbrei W, Schumacher M. Classification and Regression Trees (CART) Used for the Exploration of Prognostic Factors Measured on Different Scales. In Dirschedl P, Ostermann R. (eds.) Computational statistics: papers collected on the occasion of the 25th Conference on Statistical Computing at Schloss Reisensburg.. Heidelberg: Physica-Verlag; 1994. pp 483-96.
  • 11 Zhang H. Splitting criteria in survival trees. In Seeber GUH, Francis BJ, Hatzinger R, Steckel-Berger G. (eds.) Statistical Modelling: Proceedings of the 10th International Workshop on Statistical Modelling, Innsbruck, Austria, July 1995.. New York: Springer; 1995. pp 305-14.
  • 12 Therneau TM, Atkinson EJ. An Introduction to Recursive Partitioning Using the RPART Routines. Technical report #61. Department of Health Science Research, Mayo Clinic, Rochester 1997. Available from URL http://www.mayo.edu/hsr/techrpt/61.pdf (accessed December 12, 2004).
  • 13 Lausen B, Hothorn T, Bretz F, Schumacher M. Assessment of Optimal Selected Prognostic Factors. Biometrical Journal 2004; 46: 364-74.
  • 14 Gifi J. Nonlinear Multivariate Analysis. Chichester: Wiley; 1990
  • 15 Dannegger F. Tree stability diagnostics and some remedies for instability. Statistics in Medicine 2000; 19: 475-91.
  • 16 Hothorn T, Lausen B, Benner A, Radespiel-Tröger M. Bagging survival trees. Statistics in Medicine 2004; 23: 77-91.
  • 17 Ell C, Schneider HT, Benninger J, Theobaldy S, Friedel N, Rödl W, Wirtz P, Hahn EG. Significance of computed tomography for shock-wave therapy of radiolucent gallbladder stones. Gastroenterology 1991; 101: 1409-16.
  • 18 Benninger J, Schneider HT, Blaufuss M, Rabenstein T, Flügel H, Hahn EG, Ell C. Piezoelektrische Lithotripsie von Gallenblasensteinen. Akut- und Langzeitergebnisse, Deut Med Wochenschr 1992; 117: 1350-4. (in German).
  • 19 Sackmann M, Delius M, Sauerbruch T, Holl J, Weber W, Ippisch E, Hagelauer U, Wess O, Hepp W, Brendel W, Paumgartner G.. Shock-wave lithotripsy of gallbladder stones. The first 175 patients. New Engl J Med 1988; 318: 393-7.
  • 20 Sackmann M, Eder H, Spengler U, Pauletzky J, Holl J, Paumgartner G, Sauerbruch T. Gallbladder emptying is an important factor in fragment disappearance after shock wave lithotripsy. J Hepatol 1993; 17: 62-6.
  • 21 Radespiel-Tröger M, Rabenstein T, Schneider HT, Lausen B. Comparison of tree-based methods for prognostic stratification of survival data. Artificial Intelligence in Medicine 2003; 28: 323-41.
  • 22 Rabenstein T, Radespiel-Tröger M, Höpfner L, Benninger J, Farnbacher M, Greess H, Lenz M, Hahn EG, Schneider HT. Ten Years Experience with Piezoelectric Extracorporeal Shockwave Lithotripsy of Gallbladder Stones. European Jour- nal of Gastroenterology and Hepatology 2005; 17: 629-39.
  • 23 Therneau TM. Rpart - recursive partitioning. Department of Health Science Research, Mayo Clinic, Rochester 2002. Available from URL http://www.cran.r-project.org/ (accessed December 12, 2004).
  • 24 LeBlanc M, Crowley J. Survival Trees by Goodness of Split. J Am Stat Assoc 1993; 88: 457-67.
  • 25 Ihaka R, Gentleman R. R: A Language for Data Analysis and Graphics. J Comput Graph Stat 1996; 5: 299-314.
  • 26 Hothorn T.. Maxstat - Maximally Selected Rank Statistics. Department of Medical Informatics, Biometry and Epidemiology, Friedrich-Alexander University, Erlangen 2002 Available from URL http://cran.r-project.org/ (accessed Dec 12, 2004).
  • 27 Peters A, Hothorn T.. Ipred - Improved predictors. Department of Medical Informatics, Biometry and Epidemiology, Friedrich-Alexander University, Erlangen 2002 Available from URL http://cran.r-project.org/ (accessed Dec 12, 2004).
  • 28 Harrell FE. Hmisc - Miscellaneous functions for R. University of Virginia School of Medicine, Division of Biostatistics and Epidemiology, Charlottesville 2002 Available from URL http://cran.r-project.org/ (accessed Dec 12, 2004).
  • 29 White D.. Maptree - Mapping, pruning, and graphing tree models. National Health and Environmental Effects Research Laboratory, US Environmental Protection Agency, Corvallis 2002 Available from URL http://cran.r-project.org/ (accessed December 12, 2004).
  • 30 Graf E, Schmoor C, Sauerbrei W, Schumacher M. Assessment and Comparison of Prognostic Classification Schemes for Survival Data. Stat Med 1999; 18: 2529-45.
  • 31 Shannon CE. A Mathematical Theory of Communication, The Bell System Technical Journal. 1948; 27: 379-423.
  • 32 Friendly M. Mosaic displays for multi-way contingency tables. Journal of the American Statistical Association 1994; 89: 190-200.
  • 33 Esty WW, Banfield JD. The Box-Percentile Plot, Technical Report. Department of Mathematical Sciences, Montana State. University 1992
  • 34 Adler W, Hothorn T, Lausen B. Simulation based analysis of automated classification of medical images. Meth Inf Med 2004; 43: 150-5.
  • 35 Markowetz F, Spang R. Molecular Diagnosis. Classification, Model Selection and Performance Evaluation. Meth Inf Med 2005; 44: 438-43.