Appl Clin Inform 2026; 17(01): 107-117
DOI: 10.1055/a-2815-1912
Research Article

The Clinical Utility of Traditional and Machine Learning Alarms during the Care of Acutely Ill Patients

Authors

  • Nicole Rosario

    1   Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, Massachusetts, United States
  • Henry M. Mitchell

    1   Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, Massachusetts, United States
  • Sylvia Zhang

    2   Biofourmis, Inc, Boston, Massachusetts, United States
  • Nandakumar Selvaraj

    2   Biofourmis, Inc, Boston, Massachusetts, United States
  • Xiaozhu Zhang

    2   Biofourmis, Inc, Boston, Massachusetts, United States
  • Carme Hernandez

    1   Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, Massachusetts, United States
    3   Hospital Clínic, Barcelona, Spain
    4   University of Barcelona, Barcelona, Spain
  • Stuart R. Lipsitz

    1   Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, Massachusetts, United States
    5   Harvard Medical School, Boston, Massachusetts, United States
  • David M. Levine

    1   Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, Massachusetts, United States
    5   Harvard Medical School, Boston, Massachusetts, United States

Funding Information This work was supported by Biofourmis.

Abstract

Objectives

Despite low-level evidence, acutely ill patients are often continuously monitored. This creates high false alarm rates and alarm fatigue with unclear clinical effectiveness. We compare metrics, including alarm burden, area under the receiver operator characteristic curve (auROC), sensitivity, and specificity for threshold, score (i.e., National Early Warning Score [NEWS]), and machine learning (ML) alarms.

Methods

We retrospectively annotated continuous biometric data for acutely ill patients receiving hospital care at home for clinical utility (change in clinical management) or a safety composite using the electronic health record. Threshold alarms for heart rate (HR), respiratory rate (RR), and fall were set pragmatically by clinical teams; the score alarm was the NEWS, and the ML alarm was an unsupervised ML algorithm that detected anomalies in HR, RR, and activity. Our primary outcome was alarm burden (alarms/patient-hour). Secondary outcomes included alarm performance.

Results

We studied 526 patients of median age 71 (interquartile range [IQR]: 25), 60.3% female, 45.1% White. Compared with threshold alarms (0.132 alarms/patient-hour), alarm burden was lower with score and ML alarms (0.005 score alarms/patient-hour; 0.032 ML alarms/patient-hour; p < 0.001 for both, compared with threshold). The positive predictive value for identifying clinical utility was 0.073 for threshold, 0.247 for score, and 0.181 for ML. The auROC for identifying the safety composite was 0.557 for threshold, 0.578 for score, and 0.656 for ML.

Conclusion

Score and ML alarms decreased alarm burden with higher overall performance in recognizing clinically important events. Our findings suggest that the use of score or ML alarms holds promise in reducing alarm fatigue while improving recognition of clinically important events, although all alarms require improvement.

Protection of Human and Animal Subjects

The study protocol was prespecified and approved by the Mass General Brigham institutional review board.




Publication History

Received: 20 March 2025

Accepted after revision: 16 February 2026

Article published online:
03 March 2026

© 2026. Thieme. All rights reserved.

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany