CC BY 4.0 · Methods Inf Med 2024; 63(03/04): 122-136
DOI: 10.1055/a-2562-2163
Original Article for a Focus Theme

Alternative Strategies to Generate Class Activation Maps Supporting AI-based Advice in Vertebral Fracture Detection in X-ray Images

Samuele Pe
1   Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
,
Lorenzo Famiglini
2   Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
,
Enrico Gallazzi
3   ASST G. Pini – CTO, Milan, Italy
,
Chandra Bortolotto
4   Unit of Radiology, Department of Clinical, Surgical, Diagnostic, and Pediatric Sciences, University of Pavia, Pavia, Italy
5   Department of Radiology, I.R.C.C.S. Policlinic San Matteo Foundation, Pavia, Italy
,
Luisa Carone
5   Department of Radiology, I.R.C.C.S. Policlinic San Matteo Foundation, Pavia, Italy
,
Andrea Cisarri
4   Unit of Radiology, Department of Clinical, Surgical, Diagnostic, and Pediatric Sciences, University of Pavia, Pavia, Italy
,
Alberto Salina
4   Unit of Radiology, Department of Clinical, Surgical, Diagnostic, and Pediatric Sciences, University of Pavia, Pavia, Italy
,
Lorenzo Preda
4   Unit of Radiology, Department of Clinical, Surgical, Diagnostic, and Pediatric Sciences, University of Pavia, Pavia, Italy
5   Department of Radiology, I.R.C.C.S. Policlinic San Matteo Foundation, Pavia, Italy
,
Riccardo Bellazzi
1   Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
,
Federico Cabitza
2   Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
6   Department of Reconstructive Surgery and Osteo-articular Infections C.R.I.O. Unit, I.R.C.C.S. Galeazzi Orthopaedic Institute, Milan, Italy
,
Enea Parimbelli
1   Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
7   Telfer School of Management, University of Ottawa, Ottawa, Ontario, Canada
› Author Affiliations

Funding All research described in the article has been reviewed in compliance with ethical standards of the Italian Lombardia Region health systems and medical research bodies, and it is in line with the World Medical Association Declaration of Helsinki on Ethical Principles for Medical Research Involving Human Subjects. This work was carried out as part of the main author's master thesis at the University of Pavia. Samuele Pe is currently a PhD student enrolled in the National PhD program in Artificial Intelligence, XXXIX cycle, course on Health and Life Sciences, organized by Università Campus Bio-Medico di Roma. This work was supported by the Italian Ministry of Research, under the complementary actions to the NRRP “Fit4MedRob - Fit for Medical Robotics” Grant (# PNC0000007). Enea Parimbelli and Federico Cabitza acknowledge funding support provided by the Italian project PRIN PNRR 2022 InXAID - Interaction with eXplainable Artificial Intelligence in (medical) Decision-making. CUP: H53D23008090001 funded by the European Union - Next Generation EU.

Abstract

Background

Balancing artificial intelligence (AI) support with appropriate human oversight is challenging, with associated risks such as algorithm aversion and technology dominance. Research areas like eXplainable AI (XAI) and Frictional AI aim to address these challenges. Studies have shown that presenting XAI explanations as “juxtaposed evidence” supporting contrasting classifications, rather than just providing predictions, can be beneficial.

Objectives

This study aimed to design and compare multiple pipelines for generating juxtaposed evidence in the form of class activation maps (CAMs) that highlight areas of interest in a fracture detection task with X-ray images.

Materials and Methods

We designed three pipelines to generate such evidence. The pipelines are based on a fracture detection task from 630 thoraco-lumbar X-ray images (48% of which contained fractures). The first, a single-model approach, uses an algorithm of the Grad-CAM family applied to a ResNeXt-50 network trained through transfer learning. The second, a dual-model approach, employs two networks—one optimized for sensitivity and the other for specificity—providing targeted explanations for positive and negative cases. The third, a generative approach, leverages autoencoders to create activation maps from feature tensors, extracted from the raw images. Each approach produced two versions of activation maps: AM3—as we termed it—which captures fine-grained, low-level features, and AM4, highlighting high-level, aggregated features. We conducted a validation study by comparing the generated maps with binary ground-truth masks derived from a consensus of four clinician annotators, identifying the actual locations of fractures in a subset of positive cases.

Results

HiResCAM proved to be the best performing Grad-CAM variant and was used in both the single- and dual-model strategies. The generative approach demonstrated the greatest overlap with the clinicians' assessments, indicating its ability to align with human expertise.

Conclusion

The results highlight the potential of Judicial AI to enhance diagnostic decision-making and foster a synergistic collaboration between humans and AI.



Publication History

Received: 28 September 2024

Accepted: 16 December 2024

Article published online:
03 June 2025

© 2025. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany

 
  • References

  • 1 Wightman R, Touvron H, Jégou H. ResNet strikes back: an improved training procedure in timm. Accessed March 21, 2023; available at: http://arxiv.org/abs/2110.00476
  • 2 Becherer N, Pecarina J, Nykl S, Hopkinson K. Improving optimization of convolutional neural networks through parameter fine-tuning. Neural Comput Appl 2019; 31 (08) 3469-3479
  • 3 Dietvorst BJ, Simmons JP, Massey C. Algorithm aversion: people erroneously avoid algorithms after seeing them err. J Exp Psychol Gen 2015; 144 (01) 114-126
  • 4 Gohel P, Singh P, Mohanty M. Explainable AI: current status and future directions. (e-pub ahead of print).
  • 5 Stiglic G, Kocbek P, Fijacko N, Zitnik M, Verbert K, Cilar L. Interpretability of machine learning-based prediction models in healthcare. Wiley Interdiscip Rev Data Min Knowl Discov 2020; 10 (05) e1379
  • 6 Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D. A survey of methods for explaining black box models. ACM Comput Surv CSUR 2018; 51 (05) 1-93
  • 7 Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 2019; 1 (05) 206-215
  • 8 Cabitza F, Campagner A, Natali C, Parimbelli E, Ronzio L, Cameli M. Painting the black box white: experimental findings from applying XAI to an ECG reading setting. Mach Learn Knowl Extr 2023; 5 (01) 269-286
  • 9 Skitka LJ, Mosier KL, Burdick M. Does automation bias decision-making?. Int J Hum Comput Stud 1999; 51 (05) 991-1006
  • 10 Cabitza F, Natali C, Famiglini L, Campagner A, Caccavella V, Gallazzi E. Never tell me the odds: investigating pro-hoc explanations in medical decision making. Artif Intell Med 2024; 150: 102819
  • 11 Famiglini L, Campagner A, Barandas M, La Maida GA, Gallazzi E, Cabitza F. Evidence-based XAI: an empirical approach to design more effective and explainable decision support systems. Comput Biol Med 2024; 170: 108042
  • 12 Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int J Comput Vis 2020; 128 (02) 336-359
  • 13 Cabitza F, Campagner A, Famiglini L, Gallazzi E, La Maida GA. Color shadows (part I): exploratory usability evaluation of activation maps in radiological machine learning. In: Machine Learning and Knowledge Extraction: 6th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference, CD-MAKE 2022, Vienna, Austria, August 23–26, 2022, Proceedings. Springer-Verlag; 2022: 31-50
  • 14 Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning Deep Features for Discriminative Localization. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2921-2929
  • 15 He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. (e-pub ahead of print)
  • 16 Draelos RL, Carin L. Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks. (e-pub ahead of print).
  • 17 Draelos RL, Carin L. Explainable multiple abnormality classification of chest CT volumes. Artif Intell Med 2022; 132: 102372
  • 18 Chattopadhyay A, Sarkar A, Howlader P, Balasubramanian VN. Grad-CAM + +: Improved Visual Explanations for Deep Convolutional Networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV); 2018: 839-847
  • 19 Rong Y, Leemann T, Borisov V, Kasneci G, Kasneci E. A consistent and efficient evaluation strategy for attribution methods. (e-pub ahead of print).
  • 20 Tomsett R, Harborne D, Chakraborty S, Gurram P, Preece A. Sanity checks for saliency metrics. (e-pub ahead of print).
  • 21 Xie S, Girshick R, Dollár P, Tu Z, He K. Aggregated residual transformations for deep neural networks. (e-pub ahead of print).
  • 22 Warfield SK, Zou KH, Wells WM. Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans Med Imaging 2004; 23 (07) 903-921
  • 23 Yang F, Zamzmi G, Angara S. et al. Assessing inter-annotator agreement for medical image segmentation. IEEE Access 2023; 11: 21300-21312
  • 24 Fleiss JL. Measuring nominal scale agreement among many raters. Psychol Bull 1971; 76 (05) 378-382
  • 25 Cohen J. A coefficient of agreement for nominal scales.. Educ Psychol Meas 1960; 20 (01) 37-46
  • 26 Fu R, Hu Q, Dong X, Guo Y, Gao Y, Li B. Axiom-based Grad-CAM: towards accurate visualization and explanation of CNNs. (e-pub ahead of print).
  • 27 Jiang PT, Zhang CB, Hou Q, Cheng MM, Wei Y. LayerCAM: exploring hierarchical class activation maps for localization. IEEE Trans Image Process 2021; 30: 5875-5888
  • 28 Desai S, Ramaswamy HG. Ablation-CAM: Visual Explanations for Deep Convolutional Network via Gradient-free Localization. In: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV); 2020: 972-980
  • 29 Wang H, Wang Z, Du M. et al. Score-CAM: score-weighted visual explanations for convolutional neural networks. (e-pub ahead of print).
  • 30 Muhammad MB, Yeasin M. Eigen-CAM: Class Activation Map using Principal Components. In: 2020 International Joint Conference on Neural Networks (IJCNN); 2020: 1-7
  • 31 Srinivas S, Fleuret F. Full-gradient representation for neural network visualization. (e-pub ahead of print).
  • 32 Collins E, Achanta R, Süsstrunk S. Deep feature factorization for concept discovery. (e-pub ahead of print).
  • 33 Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. (e-pub ahead of print).
  • 34 Vaswani A, Shazeer N, Parmar N. et al. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS'17. Curran Associates Inc.; 2017: 6000-6010