CC BY-NC-ND 4.0 · Thromb Haemost 2025; 125(05): 492-504
DOI: 10.1055/a-2385-1452
Stroke, Systemic or Venous Thromboembolism

Harnessing Risk Assessment for Thrombosis and Bleeding to Optimize Anticoagulation Strategy in Nonvalvular Atrial Fibrillation

Yue Zhao*
1   Department of Pharmacy, The First Affiliated Hospital of Army Medical University (Third Military Medical University), Chongqing, P. R. China
,
Li-Ya Cao*
1   Department of Pharmacy, The First Affiliated Hospital of Army Medical University (Third Military Medical University), Chongqing, P. R. China
,
Ying-Xin Zhao*
2   Department of Pharmacy, Army Medical Center, Army Medical University (Third Military Medical University), Chongqing, P. R. China
,
Di Zhao
1   Department of Pharmacy, The First Affiliated Hospital of Army Medical University (Third Military Medical University), Chongqing, P. R. China
,
Yi-Fan Huang
3   Medical Big Data and Artificial Intelligence Center, The First Affiliated Hospital of Army Medical University (Third Military Medical University), Chongqing, PR China
,
Fei Wang
3   Medical Big Data and Artificial Intelligence Center, The First Affiliated Hospital of Army Medical University (Third Military Medical University), Chongqing, PR China
,
Qian Wang
1   Department of Pharmacy, The First Affiliated Hospital of Army Medical University (Third Military Medical University), Chongqing, P. R. China
› Author Affiliations
Funding This work was funded by the Chongqing municipal Education Commission Science and Technology Research Program (KJZD-M202212801) and Chongqing Clinical Pharmacy Key Specialties Construction Project (51561Z23772).
 


Abstract

Background

Oral anticoagulation (OAC) following catheter ablation (CA) of nonvalvular atrial fibrillation (NVAF) is essential for the prevention of thrombosis events. Inappropriate application of OACs does not benefit stroke prevention but may be associated with a higher risk of bleeding. Therefore, this study aims to develop clinical data-driven machine learning (ML) methods to predict the risk of thrombosis and bleeding to establish more precise anticoagulation strategies for patients with NVAF.

Methods

Patients with NVAF who underwent CA therapy were enrolled from Southwest Hospital from 2015 to 2023. This study compared eight ML algorithms to evaluate the predictive power for both thrombosis and bleeding. Model interpretations were recognized by feature importance and SHapley Additive exPlanations methods. With potential essential risk factors, simplified ML models were proposed to improve the feasibility of the tool.

Results

A total of 1,055 participants were recruited, including 105 patients with thrombosis and 252 patients with bleeding. The models based on XGBoost achieved the best performance with accuracies of 0.740 and 0.781 for thrombosis and bleeding, respectively. Age, BNP, and the duration of heparin are closely related to the high risk of thrombosis, whereas the anticoagulation strategy, BNP, and lipids play a crucial role in the occurrence of bleeding. The optimized models enrolling crucial risk factors, RF-T for thrombosis and Xw-B for bleeding, achieved the best recalls of 0.774 and 0.780, respectively.

Conclusion

The optimized models will have a great application potential in predicting thrombosis and bleeding among patients with NVAF and will form the basis for future score scales.


#

Introduction

Perioperative and postoperative thrombosis-related adverse events (AEs; thrombosis) are major complications of percutaneous catheter ablation (CA) for nonvalvular atrial fibrillation (NVAF).[1] [2] Oral anticoagulant (OAC) therapy is widely used to reduce the risk of thrombosis but at the expense of increasing the risk of bleeding.[3] [4] Therefore, assessing the risks and benefits of postablation OAC therapy in NVAF is necessary to develop a rational and effective anticoagulation strategy that can contribute towards averting AEs.

CA is a recommended treatment in restoring sinus rhythm in patients with symptomatic drug-refractory NVAF, and several randomized controlled trials have shown that CA is superior to antiarrhythmic drugs in reducing the recurrence and burden of atrial fibrillation (AF).[5] [6] [7] Nevertheless, CA therapy is associated with perioperative bleeding and stroke in 1.9 and 0.2%, vascular complications in 2 to 4%, and asymptomatic cerebral embolism in up to 15%.[8] [9] Local endothelial dysfunction during the procedure is generally considered to be the cause of platelet (PLT) activation, and damaged cells release thrombogenic components, leading to an increased risk of thrombosis.[10] The dislodgement of thrombosis in the left atrial appendage (LAA), hypercoagulable state due to prolonged bed rest, and recurrence of AF also increase the risk of embolism.[11]

There is a plethora of scores incorporating different risk factors to assess the risk of stroke in patients with NVAF, including the CHADS2 score, CHA2DS2-VASc score, ATRIA score, and ABC score.[12] [13] [14] The CHA2DS2-VASc scoring stands out as the most widely employed stroke assessment tool in clinical practice, owing to its inherent simplicity and pragmatic utility. Nevertheless, both the CHADS2 score and CHA2DS2-VASc score encounter challenges when it comes to predicting vascular events.[15] Simultaneously, they are more likely to overestimate the risk of thrombosis due to the inclusion of comorbidities. The HAS-BLED score, which takes into account factors such as hypertension, abnormal liver/renal function, stroke, bleeding, unstable international normalized ratio, older age, and drug or alcohol abuse, is widely used to assess the risk of major bleeding.[16] The HAS-BLED score was more suitable for patients taking warfarin, but less suitable for patients taking non-vitamin antagonist OACs because of the irregular monitoring of coagulation function. Moreover, they failed to consider various novel influencing factors and the analytical methods used that do not adequately reflect the significance of predictive indicators. Furthermore, the influences of CA and OAC were not taken into account, making it difficult to evaluate the efficiency of treatment strategies in NVAF patients with CA therapy. With the advancement of artificial intelligence technology and the emergence of novel biomarkers associated with NVAF, a novel prediction model is established by integrating numerous innovative factors to assess the risk of thrombosis and bleeding, which will provide precise support for the development of enhanced anticoagulation strategies.

In this study, we first aimed to develop accurate prediction models to detect some potential influential risk factors associated with thrombosis and bleeding with high confidence. To achieve this, eight machine learning (ML) algorithms based on 76 variables were evaluated to build reliable classification models. The second aim was to recognize and rank all potential influential risk factors for thrombosis and bleeding by the feature importance and SHapley Additive exPlanations (SHAP) methods. With important predictors, simplified ML models were proposed to improve the feasibility of the tool. At the same time, the associations and differences between thrombosis and bleeding were compared . Finally, some strategies were proposed to avoid the risk of thrombosis and bleeding for patients with NVAF.


#

Materials and Methods

Proposed Methodology

This study proposed an explainable ML-based integrating model for predicting NVAF patients with the risk of thrombosis and bleeding. To achieve this, we performed the following steps: first, the dataset was obtained, preprocessed, subsampled, and randomly divided into a training set (80%) and a testing set (20%). Second, all features were used to construct models by the eight ML algorithms, and the performance was evaluated by accuracy, specificity, recall, and area under the curve (AUC) on the testing set. Third, the first two algorithms with the best performance are selected for further feature selection. Fourth, SHAP analysis was employed to identify predictors. Finally, based on the feature importance and SHAP value ranking, an optimized model with variable weights was constructed by score assignment.


#

Study Population and Dataset Preparation

In this study, patients with NVAF used to construct ML models were collected between January 2015 and January 2023 in Southwest Hospital (the First Affiliated Hospital of Army Medical University). The inclusion criteria were as follows: (1) patients who were discharged with a primary diagnosis of NVAF, (2) patients who underwent CA therapy, and (3) patients who were followed up for at least 6 months after ablation. The exclusion criteria were as follows: (1) patients with moderate to severe mitral stenosis with AF, (2) patients with AF after mechanical valve replacement (for aortic and mitral valve stenosis), (3) patients who failed to attend follow-up appointments after CA or within 6 months following surgery were excluded from the study. As a retrospective analysis, there were some missing values for some information. Two strategies were employed to handle missing values in characteristics. First, the variables are imputed with the most recent admission index retrieved from the electronic medical record system. Then, if the proportion of missing values exceeds 5%, that particular sample was excluded from further analysis, and the remaining missing values in the variables were imputed using the median value. Thrombosis events were defined as the occurrence of thrombosis in LAA detected by transesophageal ultrasound, pulmonary embolism, transient ischemic attack, and stroke after CA procedure. Bleeding events include both major and minor bleeding events, including hematemesis, intracranial hemorrhage, hematochezia, hematuria, gingival and nasal bleeding, subcutaneous ecchymosis, and hematoma. After processing, 1,055 patients with NVAF undergoing CA therapy were retained, which separately contained 105 and 252 data with thrombosis and bleeding, respectively. Ethical approval [(B)KY2023076] was granted by the Ethics Committee of the First Affiliated Hospital of Army Medical University prior to the commencement of this study, and the requirement for informed consent was removed for a retrospective observational study.


#

Variable Selection and Preprocessing

Four different types of variables based on previous literature and preliminary findings were used to construct multiple classification models: (1) 20 baseline information components including demographic information for age, sex, and weight; lifestyle factors for smoking, drinking, and past medical history; indicators related to surgical procedures including days of hospitalization, operation duration, intraoperative active coagulation time, and other baseline relating indicators; (2) 31 medication-related factors including preoperative utilization of OAC (type, dosage, preoperative interruption), combination therapies (nonsteroidal anti-inflammatory drugs [NSAIDs], statins, P-glycoprotein inhibitors, proton pump inhibitors, beta blockers), intraoperative administration of heparin, postoperative usage of OAC, and combination therapies (same as preoperative); (3) 24 biomarker information components including kidney functions, hepatic functions, complete blood count, inflammation, coagulation function, and myocardial function indicators; (4) 5 left atrium–related indictors including width, depth of LAA, left atrium diameter (LAD), left ventricular shortening fraction and ejection fraction were collected to develop and interpret models. All data except for medication information were collected through medical data retrieval and application platforms, while medication-related information was collected by manual review of the medical records. For these features, two pretreatments were performed to delete some uninformative features before further selection: (1) remove features with a variance of 0 or close to 0; (2) calculate the linear correlation of all variables with a correlation threshold of 0.95. Herein, 76 variables were used to construct classification models with inflammatory indicators and serum cystatin C excluded.

The baseline variables were presented as mean ± standard deviation for normal continuous variables and count (percentage) for categorical variables. Nonnormally continuous variables were presented as median (interquartile range). An unpaired t-test was used to compare the differences between groups for normally continuous variables, and the Mann–Whitney U test was used to analyze differences between nonnormally distributed continuous variables. Pearson's chi-squared test or Fisher's exact test was executed to examine the association between categorical predictors and AEs. A two-sided p-value <0.05 was considered significant.


#

Data Split, Model Development, and Process

As the unbalanced distribution of positive and negative sets, subsampling techniques were used to balance the predictions of positive and negative sets.[17] For the subsampling strategy, randomly selecting samples from the negative set that are identical to the positive set and repeating it 50 times will yield reliable predictions. This approach aims to deal with the data imbalance to achieve high recall and reasonable precision for maintaining professional standards. The dataset was divided with 80% for the training set and 20% for the test set using randomly stratified sampling. Similarly, the data splitting process was iterated 50 times to ensure the utilization of all available data for modeling.

Eight popular ML algorithms were used to develop the classification models, including Random Forest (RF),[18] light gradient boosting machine (LGBM), Gradient boosting decision tree (GBT),[19] [20] eXtreme Gradient Boosting (XGBoost),[21] Naïve Bayes (NB), logistic regression (LR),[22] neural network (NN), and deep neural network (DNN).[23] RF, GBT, NN, NB, and XGBoost were implemented in the KNIME analytics platform (KNIME version 5.1),[24] and LR, LGBM, and DNN were implemented in the Keras package of Python (version 3.7). The specific parameters of the model are shown in [Supplementary Table S1] (available in the online version).


#

Performance Evaluation and Model Explainability

In this study, the performance of the classification models was evaluated by the following statistical parameters: accuracy (ACC) and AUC. In addition, the recall values were calculated to assess the recovery rate of positive samples, namely, the probability of correctly predicting the occurrence of AE. Feature importance was evaluated by the best-performing model to interpret the importance of different variables. To further provide consistent and locally accurate attribution values for each variable, the SHAP method was used to explain how feature influences the overall importance of models.[25] The absolute SHAP value represents the global contribution of the prediction by summing up all SHAP values. Predictors with a positive SHAP score help predict thrombosis or bleeding patients in the model, while predictions with a negative SHAP score help predict non-AE patients.


#
#

Results

Baseline Characteristics

Of 2,146 participants with NVAF, the incidence of thrombosis was 8.81% (189 out of 2,146) and the incidence of bleeding was 13.89% (298 out of 2,146). Due to data preprocessing principles described above, the data for the two groups were not identical. For the thrombosis group, the positive set contained 105 data points and the negative set contained 942 data points. For the bleeding group, the positive set contained 252 data points and the negative set contained 803 data points. The baseline characteristics of the participants are shown in [Table 1] and the detailed distributions are shown in [Supplementary Table S2] (available in the online version). The age, B-type natriuretic peptide (BNP) level, heparin dose, and length of stay in the AE group were statistically different from those in the non-AE group shown in [Supplementary Fig. S1] (available in the online version). These results indicated the considerable potential of demographic to distinguish AE in the population with NVAF.

Table 1

Statistical summary of the clinical variables in the AE group and non-AE group

Categories

Variables

Thrombosis (105)

Nonthrombosis (942)

p

Bleeding (252)

Nonbleeding (803)

p

Demographic information

Age, y, mean ± SD

65.81 ± 10.52

60 ± 11.64

<0.001[a]

62.69 ± 11.10

60.36 ± 11.73

0.006[a]

Gender, male, n (%)

59 (56.19)

512 (54.35)

0.720[b]

121 (48.13)

481 (56.39)

0.018[b]

Weight, kg[*]

64.23 ± 11.24

64.79 ± 11.10

0.623[a]

64 (56–70)

65 (56–71)

0.350[c]

BMI, kg/m2, mean ± SD

24.46 ± 3.20

24.51 ± 3.37

0.873[a]

24.58 ± 3.19

24.47 ± 3.40

0.640[a]

Smoking, n (%)

38 (36.19)

310 (32.91)

0.009[b]

72 (28.57)

278 (34.62)

0.070[b]

Drinking, n (%)

42 (40.00)

292 (31.0)

0.060[b]

63 (25.00)

274 (34.12)

0.007[b]

Medical history

Surgical history, n (%)

47 (44.76)

332 (35.24)

0.054[b]

110 (43.65)

274 (34.12)

0.006[b]

DM, n (%)

19 (18.10)

128 (13.59)

0.207[b]

45 (17.86)

103 (12.83)

0.045[b]

Hypertension, n (%)

50 (47.62)

424 (45.01)

0.108[c]

139 (55.16)

346 (43.09)

0.001[c]

Hyperlipidemia, n (%)

23 (21.90)

200 (21.23)

0.873[b]

44 (17.46)

181 (22.54)

0.086[b]

AS, n (%)

77 (73.33)

594 (63.06)

0.037[b]

167 (66.27)

509 (63.39)

0.405[b]

CIS, n (%)

18 (17.14)

56 (5.94)

<0.001[b]

22 (8.73)

52 (6.48)

0.221[b]

HF, n (%)

32 (30.48)

220 (23.35)

0.076[c]

70 (27.78)

183 (22.79)

0.157[c]

AF type

PeAF, n (%)

62 (59.05)

646 (68.58)

0.141[b]

167 (66.27)

546 (68.00)

0.857[b]

First episode AF, n (%)

38 (36.19)

261 (27.71)

74 (29.37)

226 (28.14)

Left cardiac structure and function

LAA-W, mm, mean ± SD

16.98 ± 2.70

16.64 ± 2.31

0.154[a]

16.75 ± 2.39

16.64 ± 2.34

0.529[a]

LAA-D, mm, mean ± SD

25.11 ± 3.92

24.99 ± 3.53

0.734[a]

25.14 ± 3.87

24.95 ± 3.47

0.460[a]

LAD, mm, mean ± SD

39.63 ± 5.05

38.09 ± 5.66

0.008[a]

38.91 ± 5.47

38.05 ± 5.65

0.035[a]

EF, mean ± SD

60.40 ± 8.47

60.83 ± 7.62

0.589[a]

60.49 ± 7.03

60.90 ± 7.88

0.458[a]

FS, mean ± SD

32.47 ± 5.72

32.99 ± 5.67

0.368[a]

32.83 ± 6.12

32.99 ± 5.51

0.711[a]

Laboratory test

GLU, mmol/L, mean ± SD

5.66 ± 1.42

5.52 ± 1.62

0.399[a]

5.56 ± 1.32

5.52 ± 1.67

0.795[a]

PLT, 109/L, mean ± SD

185.34 ± 63.147

189.49 ± 58.05

0.492[a]

181.81 ± 58.17

191.33 ± 58.32

0.024[a]

UA, μmol/L, mean ± SD

371.20 ± 96.12

354.20 ± 105.47

0.114[a]

358.51 ± 106.37

355.03 ± 104.14

0.645[a]

Alb, g/L, mean ± SD

38.91 ± 3.67

39.47 ± 3.41

0.112[a]

39.08 ± 3.39

39.51 ± 3.47

0.087[a]

ALT, IU/L[*]

20.4 (13.3–30.6)

19.8 (14.2–28.8)

0.996[c]

23.75 ± 18.53

25.70 ± 21.58

0.197[a]

AST, IU/L[*]

23.1 (19.3–29.1)

22.4 (18.7–27.4)

0.233[c]

25.37 ± 14.00

25.70 ± 13.66

0.733[a]

TBA, μmol/L[*]

3.6 (2.6–6.8)

4.3 (2.6–7.1)

0.389[c]

5.86 ± 6.14

6.06 ± 7.88

0.709[a]

Creatinine, μmol/L[*]

77.72 ± 18.60

75.00 ± 36.02

0.445[a]

70.0 (61.7–81.5)

72.1 (61.2–84.2)

0.374[c]

PT-INR[*]

1.05 ± 0.29

1.04 ± 0.69

0.860[a]

0.96 (0.91–1.04)

0.94 (0.89–1.01)

0.002[c]

Fib, g/L[*]

2.55 (2.07–3.10)

2.44 (2.12–2.88)

0.126[c]

2.59 ± 0.76

2.57 ± 0.69

0.674[a]

LDL_C, mmol/L[*]

2.69 (2.07–3.24)

2.70 (2.18–3.18)

0.846[c]

2.68 ± 0.77

2.72 ± 0.74

0.425[a]

HDL_C, mmol/L[*]

1.18 ± 0.33

1.17 ± 0.30

0.750[a]

1.12 (0.98–1.28)

1.13 (0.97–1.34)

0.363[c]

TG, mmol/L, mean ± SD

1.48 ± 1.20

1.59 ± 1.12

0.377[a]

1.50 ± 0.99

1.60 ± 1.17

0.233[a]

Tch, mmol/L[*]

4.15 (3.54–5.04)

4.34 (3.63–5.03)

0.856[c]

4.27 ± 1.02

4.39 ± 1.02

0.090[a]

BNP, pg/mL, median (IQR)

169 (83–280)

117 (35–169)

<0.001[c]

154 (59–176)

116 (34–169)

0.002[c]

PCT, ng/mL, median (IQR)

0.22 (0.20–0.29)

0.22 (0.19–0.27)

0.501[c]

0.23 (0.19–0.30)

0.22 (0.19–0.28)

0.065[c]

CA

Length of stay, day[*]

9 (7–11)

8 (6–10)

0.025[a]

9.1 ± 3.3

8.5 ± 3.4

0.028[a]

Heparin dose, IU, mean ± SD

6,597 ± 1,751

7,238 ± 1,899

<0.001[a]

6,626 ± 1,871

7,341 ± 1,868

<0.001[a]

T4, day[*]

2 (1–4)

1 (1–3)

0.003[c]

2.42 ± 2.01

2.04 ± 1.94

0.008[a]

T3, h[*]

3.26 ± 0.85

3.24 ± 0.84

0.751[a]

3.17 (2.75–3.65)

3.17 (2.67–3.67)

0.265[c]

OACs

OAC_1, n (%)

42 (40.0)

316 (33.55)

0.007[b]

103 (40.87)

257 (32.00)

<0.001[b]

Preoperative combined medication, n (%)

85 (80.95)

746 (78.53)

0.036[d]

198 (78.57)

583 (72.60)

0.161[d]

Interruption, day, n (%)

25 (23.81)

161 (17.09)

0.005[b]

50 (19.84)

138 (17.19)

0.855[b]

Postoperative NOACs, n (%)

102 (97.14)

790 (83.86)

<0.001[b]

190 (75.40)

713 (88.79)

<0.001[b]

T6, mo, median (IQR)

2.2 (1–4)

2.5 (1–3.2)

0.991[c]

3 (1–4)

2 (1–3)

0.013[c]

Adjustment, n (%)

86 (81.90)

282 (29.94)

0.164[d]

92 (36.51)

230 (28.64)

0.012[b]

T7 >3 mo, n (%)

23 (21.90)

140 (14.86)

0.001[b]

54 (21.43)

109 (13.57)

0.002[c]

T5 >3 mo, n (%)

61 (58.10)

533 (56.58)

0.002[b]

173 (68.65)

427 (53.18)

<0.001[b]

Abbreviations: ACT, activated clotting time of whole blood; Alb, albumin; ALT, alanine transaminase; AS, atherosclerosis; AST, aspartate transaminase; BMI, body mass index; BNP, B-type natriuretic peptide; CA, catheter ablation; CIS, chronic myocardial ischemia syndrome; DM, diabetes mellitus; EF, ejection fraction; Fib, fibrinogen; First episode AF, first episode atrial fibrillation; FS, fractional shortening; GLU, glucose; HDL_C, high-density lipoprotein cholesterol; HF, heart failure; LAA-D, left atrial appendage depth; LAA-W, left atrial appendage width; LAD, left atrial dimension; LDL_C, low-density lipoprotein cholesterol; NOACs, non-vitamin antagonist oral anticoagulants; OAC_1, OAC before CA operation; OACs, oral anticoagulants; PCT, procalcitonin; PeAF, persistent atrial fibrillation; PLT, platelets; PT-INR, prothrombin time international normalized ratio; SD, standard deviation; T3, operating time; T4, the duration of heparin; TBA, total bile acids; Tch, total cholesterol; TG, triglyceride; TT, thrombin time; UA, uric acid; Interruption, the proportion of OAC was started the day after CA therapy; T6, the duration of OAC after discharge (before adjustment); T7, proportion of the duration of OAC >3 months on the adjustment; T5, proportion of the total duration of OAC >3 months.


* The two target columns do not all conform to a normal distribution because the data are not the same. For example, in the thrombosis group, “weight” was homogeneous and a t-test was used and presented as mean ± standard deviation, whereas in the bleeding group, “weight” was not homogeneous and a nonparametric test was chosen and presented as median (25%–75%).


a Independent sample t-test.


b Pearson χ2-test.


c Wilcoxon–Mann–Whitney test.


d Fisher's exact test.



#

Comparison of the Performances of ML Methods on Prediction of Thrombosis and Bleeding Following CA for Patients with NVAF

The performance of multiple ML methods including RF, XGBoost, GBT, LightGBM, NB, LR, NN, and DNN is shown in [Table 2] and [Supplementary Fig. S2] (available in the online version). Both the XGBoost and RF models achieved better performance in respect of recalls at 0.841 and 0.777, and AUC scores at 0.854 and 0.795 for the predictions of thrombosis. Based on the average ACC values, although the NB model has the highest overall accuracy and performed well in predicting nonthrombosis samples, it actually achieved high accuracy in prediction at the expense of the recovery of a positive set. For the predictions of bleeding, the overall predictions were remarkably similar to the model results of thrombosis. Furthermore, the accuracies fall short compared to that of thrombosis. For instance, the recall for thrombosis predicted by the RF algorithm stands at 0.777, while the recall for bleeding was only 0.642. To maximize the identification of high risk of thrombosis and bleeding, the XGBoost and RF algorithms were selected for further feature selection and construction of optimized models.

Table 2

Performance of the models based on different ML algorithms

Model

Recall

Specificity

BA

ACC

AUC

Thrombosis

RF

0.777 (0.018)

0.653 (0.009)

0.715 (0.010)

0.664 (0.009)

0.795 (0.006)

XGBoost

0.841 (0.015)

0.731 (0.010)

0.786 (0.009)

0.740 (0.009)

0.854 (0.005)

GBT

0.626 (0.039)

0.717 (0.010)

0.672 (0.020)

0.708 (0.010)

0.730 (0.011)

LightGBM

0.731 (0.019)

0.640 (0.010)

0.685 (0.010)

0.650 (0.010)

0.720 (0.006)

NB

0.453 (0.029)

0.920 (0.010)

0.687 (0.014)

0.887 (0.009)

0.887 (0.007)

LR

0.672 (0.034)

0.721 (0.017)

0.697 (0.017)

0.714 (0.014)

0.778 (0.009)

NN

0.655 (0.029)

0.716 (0.016)

0.685 (0.015)

0.706 (0.014)

0.722 (0.015)

DNN

0.730 (0.018)

0.654 (0.012)

0.692 (0.010)

0.663 (0.011)

0.737 (0.006)

Bleeding

RF

0.642 (0.007)

0.799 (0.007)

0.721 (0.005)

0.764 (0.006)

0.800 (0.002)

XGBoost

0.744 (0.009)

0.791 (0.006)

0.767 (0.004)

0.781 (0.004)

0.864 (0.002)

GBT

0.644 (0.015)

0.823 (0.012)

0.734 (0.010)

0.786 (0.010)

0.837 (0.005)

LightGBM

0.576 (0.009)

0.805 (0.009)

0.691 (0.006)

0.747 (0.006)

0.721 (0.003)

NB

0.548 (0.011)

0.964 (0.004)

0.756 (0.006)

0.910 (0.003)

0.925 (0.002)

LR

0.668 (0.016)

0.734 (0.008)

0.701 (0.009)

0.719 (0.008)

0.800 (0.004)

NN

0.671 (0.016)

0.674 (0.018)

0.672 (0.009)

0.673 (0.012)

0.744 (0.009)

DNN

0.617 (0.014)

0.700 (0.009)

0.659 (0.009)

0.679 (0.008)

0.712 (0.004)

Note: The values in bold is the maximum value in this measure.


Abbreviations: ACC, accuracy; AUC, area under the curve.



#

Feature Importance for Predicting Thrombosis and Bleeding

To acquire more accurate and concise features, both RF and XGB algorithms have been chosen to conduct feature significance analysis for feature selection as shown in [Fig. 1]. The higher the value of feature importance, the more crucial the feature in ML models. Regarding the importance of features in the prediction of thrombosis by two ML algorithms, the top 10 features showed a remarkable lack of consistency, with the exception being the time of intraoperative heparin application (T4), which demonstrated a robust feature for thrombosis. In predicting bleeding, the application of OAC is the most significant feature in both algorithms. The anticoagulation strategy, sex, BNP, total cholesterol (Tch), and triglyceride (TG) level are significant contributors to both thrombosis and bleeding. Aside from this, beta blocker, NSAID, chronic myocardial ischemia syndrome, and age also play a critical role for thrombosis, while preoperative co-medication, diabetes mellitus, LAD, PLT, creatinine, and BNP level also significantly contribute to the prediction of bleedings.

Zoom Image
Fig. 1 Top 40 significant features of thrombosis and bleeding. Feature importance obtained by RF algorithm and XGBoost algorithm for (A) Thrombosis group by RF algorithm, (B) Bleeding group by RF algorithm, (C) Thrombosis group by XGBoost algorithm, and (D) Bleeding group by XGBoost algorithm. OAC_2, type of OAC in hospital after CA operation; OAC_3, type of initial after discharge (before adjustment); OAC_4, type of OAC after discharge (after adjustment). The label of dosage is same as OAC. T3, operating time; T4, the duration of heparin; T5, the total duration of OAC; T6, the duration of OAC after discharge (before adjustment); T7, the duration of OAC after discharge (after adjustment). Other abbreviations are mentioned in [Table 1].

The feature significance obtained by different algorithms varies significantly, and we further elaborated the influence of various features using the SHAP XGBoost-based method shown in [Fig. 2]. The higher the SHAP absolute value of a variable, the larger the contribution to the model. The results in [Fig. 2] show that age, the duration of OAC after discharge (T6), and BNP level are associated with a higher risk of thrombosis, while type and dosage of OAC and alanine transaminase (ALT) exhibited a stronger correlation with high-risk bleeding. The SHAP values of the top 20 indicators for each sample are represented by a color gradient, with lighter shades of golden yellow indicating higher SHAP values and darker shades of blue-purple representing lower SHAP values. A threshold was set at a SHAP value of 0, with positive contributions to the model above this value indicating higher risk, and negative contributions below 0 indicating lower risk. Moreover, the direction of the effect of top features on thrombosis and bleeding is shown in [Figs. 3] and [4]. Patients with older age (age > 75), the shorter of the duration of OAC (T6 < 1 month), higher BNP level (BNP > 500 pg/mL), the higher fibrinogen level (Fib > 3 g/L), lower TG level (TG < 1 mmol/L), and alcohol consumption are more likely to develop thrombosis. Meanwhile, long-term use of OAC (T7 > 5 months), lower PLT level (<200 × 109/L), and higher UA level (>400 μmol/L) are more likely to result in bleeding.

Zoom Image
Fig. 2 SHAP values for thrombosis and bleeding by XGBoost algorithm. SHAP values obtained by XGBoost algorithm for (A) SHAP absolute value for thrombosis group, (B) SHAP summary plot for thrombosis group, (C) the SHAP absolute value for bleeding group, and (D) SHAP summary plot for bleeding group. OAC_2, type of OAC in hospital after CA operation; OAC_3, type of initial after discharge (before adjustment). The label of dosage is same as OAC. T3, operating time; T6, the duration of OAC after discharge (before adjustment); T7, the duration of OAC after discharge (after adjustment). Other abbreviations are mentioned in [Table 1].
Zoom Image
Fig. 3 Partial dependence plot for the top nine representative features for predicting thrombosis. The risk of thrombosis is influenced by (A) age and blood glucose (GLU), (B) the duration of OAC after discharge (before adjustment) and diabetes mellitus (DM), (C) BNP and left atrial appendage width (LAA-W), (D) total bile acids (TBA) and albumin (Alb), (E) fibrinogen (Fib) and type of OAC in hospital after CA operation, (F) Cr and Alb, (G) triglyceride (TG) and type of OAC in hospital after CA operation, (H) drinking and TBA, and (I) GLU and drinking. CA, catheter ablation.
Zoom Image
Fig. 4 Partial dependence plot for the top nine representative features for predicting bleeding. The risk of bleeding is influenced by (A) type of initial after discharge (before adjustment) and LAD, (B) ALT and BMI, (C) the dosage of initial after discharge (before adjustment) and EF, (D) high-density lipoprotein cholesterol (HDL_C) and Fib, (E) heparin dose and direct bilirubin (DBIL), (F) platelets (PLT) and type of initial after discharge (after adjustment), (G) the duration of OAC after discharge (after adjustment) and PLT, (H) BNP and heparin dose, and (I) uric acid (UA) and DBP. ALT, alanine transaminase; BMI, body mass index; LAD, left atrium diameter; OAC, oral anticoagulation.

#

Feature Selection for Constructing the Final ML Models

Based on the results of SHAP method and feature importance, we further refined features and reconstruct the final prediction models. The feature was awarded 1 point if it appeared in the top 20 of both algorithms, and 2 points if it was in the top 10 for both algorithms. If a feature ranked in the top 10 of only one algorithm, it was awarded 0.5 points. Additionally, 1 point was given if the feature also appeared in the SHAP results (top 20). Furthermore, a feature was awarded 1 point if it appeared in the top 20 of only one algorithm but also appeared in the SHAP results (top 20). Each variable can be assigned up to a maximum of 3 points based on the above criteria. Finally, the features were sorted in descending order according to their scores and included as predictor variables, with the specific results shown in [Table 3] and the specific processing process shown in [Supplementary Table S3] (available in the online version). A total of 25 features were retained for the model of thrombosis and 27 features were selected for the model of bleeding. The process of remodeling was the same as above, with XGBoost and RF algorithms selected and features weighted according to the voting scores. Specific modeling results are shown in [Table 4]. In addition to the thrombosis model using XGBoost algorithm had some decline, the accuracies of other models were stable and even better than those of previous models, so an RF-based model for thrombosis (RF-T) and an XGBoost_w-based model for bleeding (Xw-B) models were used for the final model construction and prediction.

Table 3

The subset of features for the final model for thrombosis and bleeding

Thrombosis

Bleeding

Feature

Score

Feature

Score

Age

2

OAC_3[a]

3

T4

2

OAC dosage_3[a]

2

TBA

1.5

Tch

2

ALT

1.5

T6

1.5

GLU

1.5

BNP

1.5

BNP

1.5

Alb

1.5

Fib

1.5

TG

1.5

LDL_C

1.5

PLT

1.5

PLT

1

T7

1.5

Interruption

1

Cr

1.5

TG

1

PCT

1

AST

1

UA

1

T3

1

Fib

1

EF

1

AST

1

Alb

1

GLU

1

Beta blocker_1[b]

0.5

ALT

1

Sex

0.5

TT

1

Drug adjustment

0.5

LAD

1

OAC dosage_1[b]

0.5

PPI_1[b]

0.5

NSAID_2[a]

0.5

Diabetes

0.5

CIS

0.5

Sex

0.5

T7

0.5

Drug adjustment

0.5

PPI_2[a]

0.5

Interruption

0.5

Cr

0.5

LDL_C

0.5

Drinking

0.5

HDL_C

0.5

Heparin dose

0.5

OAC dosage_1[b]

0.5

Abbreviations: T4, the duration of heparin; T3, operating time; OAC_3, type of OAC after discharge (before adjustment); T6, the duration of OAC after discharge (before OAC regimen adjustment); T7, the duration of OAC after discharge (after OAC regimen adjustment).


Note: The variables in bold are those present in both models.


a CA preoperation.


b Postoperative CA.


Table 4

Performances of the simplified models based on two algorithms

Model

Recall

Specificity

BA

ACC

AUC

Thrombosis

XGBoost

0.731 (0.025)

0.706 (0.010)

0.718 (0.014)

0.708 (0.010)

0.798

XGBoost_w

0.730 (0.022)

0.680 (0.008)

0.705 (0.012)

0.685 (0.008)

0.773

RF

0.774 (0.015)

0.673 (0.010)

0.724 (0.001)

0.683 (0.001)

0.799

RF_w

0.720 (0.024)

0.693 (0.008)

0.707 (0.012)

0.696 (0.007)

0.794

Bleeding

XGBoost

0.773 (0.013)

0.806 (0.009)

0.790 (0.007)

0.799 (0.007)

0.885

XGBoost_w

0.780 (0.014)

0.805 (0.007)

0.792 (0.006)

0.800 (0.005)

0.890

RF

0.761 (0.011)

0.705 (0.008)

0.733 (0.006)

0.717 (0.006)

0.833

RF_w

0.711 (0.011)

0.821 (0.006)

0.766 (0.004)

0.797 (0.004)

0.872

Note: The values in bold is the maximum value in this measure.


Abbreviations: ACC, accuracy; AUC, area under the curve; RF_w, RF algorithm using features weighted; XGBoost_w, XGBoost algorithm using features weighted.



#
#

Discussion

The Performances of Multiple ML Models for Thrombosis and Bleeding

Considering that the management of OAC following CA is crucial to reduce the risk of thrombosis and bleeding, eight ML models were developed to detect risk factors for thrombosis and bleeding in NVAF patients using 76 features. By comparing eight commonly used ML algorithms, the XGBoost and RF-based models were the most powerful in evaluating the importance of each factor in predicting AE. Our statistical measures of performance scores (AUC) exist some discrepancies with the results of other studies. In a cohort study of 9,670 patients, the AUC of the model for predicting ischemic stroke based on GBT algorithm reached 0.685,[26] while our model for predicting thrombosis based on GBT algorithm was 0.730. The AUCs for bleeding risk prediction ranged from 0.57 to 0.61 in a cohort study, while our result can reached 0.876.[27] There may be three reasons for the difference in model accuracy. The first is that the data come from different sources. The second is the choice of variables, which included a total of 43 variables, but 42 of them are binary variables. Binary variables provide very limited information, which may be the main reason for the low AUCs of their research. In addition, the indicator for assessing model was different. In our model, a recall value was used for hyperparameter tuning, but both precision and accuracy were reduced to varying degrees. Nonetheless, the performances were consistent in that the ML models achieved a better performance in prediction of the long-term risk of thrombosis and bleeding compared to CHA2DS2-VASc and HAS-BLED risk scores. Even compared to our previous study, the AUCs for the risk of LAA thrombosis ranged from 0.889 to 0.897.[28] From a holistic perspective, the overall accuracies of bleeding were inferior to that of the thrombosis model, and the bleeding model has significantly higher specificity.

To achieve a more streamlined model so that the clinical use of a minimum number of clinical indicators can directly predict risks, we achieved comparable performance with AUCs of 0.799 and 0.890 for RF-T and Xw-B models, respectively. Consequently, the RF-T model includes 25 features and Xw-B model includes 27 features from several different categories without including any clinical information that might be expensive, tedious, and time-consuming to acquire. The results also showed that XGBoost and RF algorithms were widely recognized for their efficiency and effectiveness in a variety of scenarios, outperforming other algorithms. The tree-based ensemble learning methods also provide built-in feature importance estimates that recognize the most impactful features in intricate exposure datasets.[29] [30] While deep learning significantly improves model accuracy in learning tasks such as image classification and electrocardiogram analysis, its performance was not outstanding in small data models.[31] Therefore, this also suggested that the tree-based ensemble learning algorithms may be the first choice when the amount of data is not large and without imaging data.


#

The Differences and Correlations between Thrombosis and Bleeding Events

To further explore the contribution of different risk factors to models, feature importance and SHAP analysis were carried out. Interestingly, BNP levels were identified as top-ranked predictors in both models by SHAP analysis. In the thrombosis model, BNP ranked third, with an increased risk of thrombosis at levels above 500 pg/mL. The BNP ranked eighth in bleeding model. Although the thresholds are not clear, the overall trend is consistent. The higher the level of BNP, the higher the risk of bleeding. Consistent with the study of ARISTOTLE cohort, NT-proBNP levels in patients with AF are associated not only with ischemic stroke risk, but also with bleeding risk.[32] Furthermore, in our previous research of LAA thrombosis, the plasma BNP level was significantly higher (BNP level > 400 pg/mL) in patients with LAA thrombosis than in those without LAA thrombosis.[28] In the SHAP results, older age with higher blood glucose level in patients with NVAF was prone to thrombosis, which is consistent with another research that fasting blood glucose was reported to be an independent predictor of PLT-dependent thrombosis in stable coronary artery disease patients.[33] In addition, patients with NVAF and diabetes are more likely to develop thrombosis when OAC is used for less than 1 month (T6). In terms of predicting bleeding, patients on warfarin with wider LAD are more likely to be at higher risk of bleeding. Consistent with the findings of Lu et al, both emphasize the use of OAC as the most important risk factor in bleeding events.[26] The ALT levels ranged from 13 to 31 IU/L across the quartiles from the collected dataset. No increased risk of bleeding was observed in patients with slightly elevated ALT levels. This finding may indicate that patients with NVAF with mild abnormalities in liver enzymes do not need to be overly concerned about the risk of bleeding. Patients with a high body mass index level who received warfarin had a lower risk of major bleeding compared with normal patients. This could be attributable to the dose of warfarin given to obese patients that is not adjusted for their weight on an individual basis. It could also be because the dose was not adjusted for long-term weight gain.[34]

Based on the voting strategy in combination with the feature importance and SHAP analysis, the most important influencing factors are clarified. Among the top 25 significant risk factors for thrombosis, top 8 were the most predictive with age, the duration of heparin, total bile acids (TBA), ALT, blood glucose, the level of BNP, fibrinogen, and low-density lipoprotein cholesterol (LDL_C). Among the top 27 significant risk factors for bleeding, the model identified that top 10 were the most predictive, including OAC category, dosage and duration, PLT, BNP, blood lipid, and albumin level. In general, only BNP levels were significant predictors of both thrombosis and bleeding events. In NVAF patients undergoing CA therapy, the risk of thrombosis is more commonly associated with advanced age, TBA, and the duration of heparin, whereas the risk of bleeding is more dependent on the choice of anticoagulation regimen and coagulation indicators.


#
#

Limitations and Future Study

Several limitations of the study are worth mentioning. The dataset used to build the model was derived from one center covering 1,100 patients, which may unintentionally over- or underestimate the risks. The feasibility and extensibility of the results need to be verified in future studies with larger samples. Several potential features such as inflammation indicators and cardiac troponin with more than 30% missing values were excluded from our study. However, in many studies inflammatory markers play an important predictive role in adverse outcomes in patients with NVAF. The improvement of our ML models could be more significant with this additional information. More importantly, we only focused on whether the patient had an event during the follow-up period, and if it occurred, it was used as an endpoint, and we did not consider whether there were subsequent adjustments to the anticoagulation strategy. Hence, we will continue to follow this population to reduce the risk of AEs by adjusting the anticoagulation strategy to optimize the prediction model. Although the model that we have constructed can predict risk and be used to warn and alert potentially high-risk populations, it is currently not suitable for direct application in clinical scoring due to the excessive number of clinical features. Therefore, the next step is to enlarge the dataset, construct assessment scales, and integrate them into a web-based platform that can directly assist health care professionals in risk assessment of patients with NVAF.


#

Conclusion

In this study, we evaluated and compared eight ML algorithms in the detection of risk factors of both thrombosis and bleeding. The final models, RF-T and the Xw-B, were able to identify high-risk NVAF patients suffering from potential thrombosis and bleeding based on a few easy-to-find features. We also identified that age, TBA, and BNP level are crucial in predicting thrombosis, while anticoagulation regimen, coagulation indictors, and BNP level were most predictive of bleeding. In summary, this study provides clinical evidence–based advice to optimize the anticoagulation strategy for NVAF patients and is of great significance for the prevention of thrombosis and bleeding-related events.

What is known about this topic?

  • Inappropriate use of oral anticoagulants (OACs) for nonvalvular atrial fibrillation (NVAF) not only fails to prevent thrombosis, but is also associated with a higher risk of bleeding.

  • This study aimed to develop clinical data-driven machine learning methods to dynamically predict thrombosis and bleeding to develop more refined OAC treatment strategies for AF patients.

What does this paper add?

  • The simplified machine learning models RF-T and Xw-B have better prediction performance for thrombosis and bleeding, and the overall accuracy (AUC) reaches 0.799 and 0.890, respectively.

  • The duration of heparin and BNP level are closely related to the risk of thrombosis, while the administration strategy of OAC, the level of PLT, and BNP play a crucial role in the occurrence of bleeding.


#
#

Conflict of Interest

None declared.

Data Availability Statement

Data will be made available on request.


Ethical Approval Statement

This research was approved by the Ethics Committee of the First Affiliated Hospital of Army Medical University [(B)KY2023076] prior to the commencement of this study, and informed consent was waived because of a retrospective observational study.


Authors' Contribution

Y.Z. contributed to conceptualization, methodology, formal analysis, visualization, writing—original draft, writing—review and editing. L.-Y.C. contributed to data curation, investigation, visualization, writing—review and editing. Y.-X.Z. contributed to data curation, visualization, writing—review and editing. D.Z. contributed to data curation, writing—review. Y.-F.H. contributed to data curation, writing—review. F.W. contributed to investigation, supervision, writing—review. Q.W. contributed to supervision, writing—review and editing, funding acquisition.


* These authors contributed equally to this article.


Supplementary Material

  • References

  • 1 Yang W-Y, Du X, Jiang C. et al. The safety of discontinuation of oral anticoagulation therapy after apparently successful atrial fibrillation ablation: a report from the Chinese Atrial Fibrillation Registry study. Europace 2020; 22 (01) 90-99
  • 2 Friberg L, Tabrizi F, Englund A. Catheter ablation for atrial fibrillation is associated with lower incidence of stroke and death: data from Swedish health registries. Eur Heart J 2016; 37 (31) 2478-2487
  • 3 Meyre PB, Blum S, Hennings E. et al. Bleeding and ischaemic events after first bleed in anticoagulated atrial fibrillation patients: risk and timing. Eur Heart J 2022; 43 (47) 4899-4908
  • 4 Hylek EM, Held C, Alexander JH. et al. Major bleeding in patients with atrial fibrillation receiving apixaban or warfarin: the ARISTOTLE Trial (Apixaban for Reduction in Stroke and Other Thromboembolic Events in Atrial Fibrillation): predictors, characteristics, and clinical outcomes. J Am Coll Cardiol 2014; 63 (20) 2141-2147
  • 5 Chew D, Piccini JP. Long-term oral anticoagulant after catheter ablation for atrial fibrillation. Europace 2021; 23 (08) 1157-1165
  • 6 Mark DB, Anstrom KJ, Sheng S. et al; CABANA Investigators. Effect of catheter ablation vs medical therapy on quality of life among patients with atrial fibrillation: the CABANA randomized clinical trial. JAMA 2019; 321 (13) 1275-1285
  • 7 Marrouche NF, Brachmann J, Andresen D. et al; CASTLE-AF Investigators. Catheter ablation for atrial fibrillation with heart failure. N Engl J Med 2018; 378 (05) 417-427
  • 8 Hindricks G, Potpara T, Dagres N. et al; ESC Scientific Document Group. 2020 ESC Guidelines for the diagnosis and management of atrial fibrillation developed in collaboration with the European Association for Cardio-Thoracic Surgery (EACTS): the Task Force for the diagnosis and management of atrial fibrillation of the European Society of Cardiology (ESC) developed with the special contribution of the European Heart Rhythm Association (EHRA) of the ESC. Eur Heart J 2021; 42 (05) 373-498
  • 9 Zhang Z, Zhu J, Wu M, Neidlin M, Wu WT, Wu P. Computational modeling of hemodynamics and risk of thrombosis in the left atrial appendage using patient-specific blood viscosity and boundary conditions at the mitral valve. Biomech Model Mechanobiol 2023; 22 (04) 1447-1457
  • 10 Nagao T, Higo S, Suzuki H. et al. Prospective comparison of periprocedural coagulation markers among uninterrupted anticoagulants for atrial fibrillation ablation. Heart Rhythm 2020; 17 (03) 391-397
  • 11 Akar JG, Jeske W, Wilber DJ. Acute onset human atrial fibrillation is associated with local cardiac platelet activation and endothelial dysfunction. J Am Coll Cardiol 2008; 51 (18) 1790-1793
  • 12 Berg DD, Ruff CT, Jarolim P. et al. Performance of the ABC scores for assessing the risk of stroke or systemic embolism and bleeding in patients with atrial fibrillation in ENGAGE AF-TIMI 48. Circulation 2019; 139 (06) 760-771
  • 13 Oyama K, Giugliano RP, Berg DD. et al. Serial assessment of biomarkers and the risk of stroke or systemic embolism and bleeding in patients with atrial fibrillation in the ENGAGE AF-TIMI 48 trial. Eur Heart J 2021; 42 (17) 1698-1706
  • 14 Hijazi Z, Lindahl B, Oldgren J. et al. Repeated measurements of cardiac biomarkers in atrial fibrillation and validation of the ABC stroke score over time. J Am Heart Assoc 2017; 6 (06) e004851
  • 15 Yu I, Song T-J, Kim BJ. et al. CHADS2, CHA2DS2-VASc, ATRIA, and Essen stroke risk scores in stroke with atrial fibrillation: a nationwide multicenter registry study. Medicine (Baltimore) 2021; 100 (03) e24000
  • 16 Pisters R, Lane DA, Nieuwlaat R, de Vos CB, Crijns HJ, Lip GY. A novel user-friendly score (HAS-BLED) to assess 1-year risk of major bleeding in patients with atrial fibrillation: the Euro Heart Survey. Chest 2010; 138 (05) 1093-1100
  • 17 Kosolwattana T, Liu C, Hu R, Han S, Chen H, Lin Y. A self-inspected adaptive SMOTE algorithm (SASMOTE) for highly imbalanced data classification in healthcare. BioData Min 2023; 16 (01) 15
  • 18 Bi X-A, Xing Z, Zhou W, Li L, Xu L. Pathogeny detection for mild cognitive impairment via weighted evolutionary random forest with brain imaging and genetic data. IEEE J Biomed Health Inform 2022; 26 (07) 3068-3079
  • 19 Zhang Y, Zhang X, Lane AN, Fan TW, Liu J. Inferring gene regulatory networks of metabolic enzymes using gradient boosted trees. IEEE J Biomed Health Inform 2020; 24 (05) 1528-1536
  • 20 Natekin A, Knoll A. Gradient boosting machines, a tutorial. Front Neurorobot 2013; 7: 21
  • 21 Bertsimas D, Mingardi L, Stellato B. Machine learning for real-time heart disease prediction. IEEE J Biomed Health Inform 2021; 25 (09) 3627-3637
  • 22 LaValley MP. Logistic regression. Circulation 2008; 117 (18) 2395-2399
  • 23 Zuo C, Qian J, Feng S. et al. Deep learning in optical metrology: a review. Light Sci Appl 2022; 11 (01) 39
  • 24 Fillbrunn A, Dietz C, Pfeuffer J, Rahn R, Landrum GA, Berthold MR. KNIME for reproducible cross-domain analysis of life science data. J Biotechnol 2017; 261: 149-156
  • 25 Ali S, Akhlaq F, Imran AS, Kastrati Z, Daudpota SM, Moosa M. The enlightening role of explainable artificial intelligence in medical & healthcare domains: a systematic literature review. Comput Biol Med 2023; 166: 107555
  • 26 Lu J, Hutchens R, Hung J. et al. Performance of multilabel machine learning models and risk stratification schemas for predicting stroke and bleeding risk in patients with non-valvular atrial fibrillation. Comput Biol Med 2022; 150: 106126
  • 27 Apostolakis S, Lane DA, Guo Y, Buller H, Lip GY. Performance of the HEMORR(2)HAGES, ATRIA, and HAS-BLED bleeding risk-prediction scores in patients with atrial fibrillation undergoing anticoagulation: the AMADEUS (evaluating the use of SR34006 compared to warfarin or acenocoumarol in patients with atrial fibrillation) study. J Am Coll Cardiol 2012; 60 (09) 861-867
  • 28 Zhao Y, Cao L-Y, Zhao Y-X. et al. Medical record data-enabled machine learning can enhance prediction of left atrial appendage thrombosis in nonvalvular atrial fibrillation. Thromb Res 2023; 223: 174-183
  • 29 Atehortúa A, Gkontra P, Camacho M. et al. Cardiometabolic risk estimation using exposome data and machine learning. Int J Med Inform 2023; 179: 105209
  • 30 Shwartz-Ziv R, Armon A. Tabular data: deep learning is not all you need. Inf Fusion 2022; 81: 84-90
  • 31 Prifti E, Fall A, Davogustto G. et al. Deep learning analysis of electrocardiogram for risk prediction of drug-induced arrhythmias and diagnosis of long QT syndrome. Eur Heart J 2021; 42 (38) 3948-3961
  • 32 Sideris S, Archontakis S, Latsios G. et al. Biomarkers associated with bleeding risk in the setting of atrial fibrillation. Curr Med Chem 2019; 26 (05) 824-836
  • 33 Shechter M, Merz CN, Paul-Labrador MJ, Kaul S. Blood glucose and platelet-dependent thrombosis in patients with coronary artery disease. J Am Coll Cardiol 2000; 35 (02) 300-307
  • 34 Hart R, Veenstra DL, Boudreau DM, Roth JA. Impact of body mass index and genetics on warfarin major bleeding outcomes in a community setting. Am J Med 2017; 130 (02) 222-228

Address for correspondence

Fei Wang
Medical Big Data and Artificial Intelligence Center, The First Affiliated Hospital of Army Medical University
Chongqing 400038
P. R. China   

Qian Wang
Department of Pharmacy, The First Affiliated Hospital of Army Medical University
Chongqing 400038
P. R. China   

Publication History

Received: 08 April 2024

Accepted: 04 August 2024

Accepted Manuscript online:
13 August 2024

Article published online:
19 September 2024

© 2024. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/)

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

  • References

  • 1 Yang W-Y, Du X, Jiang C. et al. The safety of discontinuation of oral anticoagulation therapy after apparently successful atrial fibrillation ablation: a report from the Chinese Atrial Fibrillation Registry study. Europace 2020; 22 (01) 90-99
  • 2 Friberg L, Tabrizi F, Englund A. Catheter ablation for atrial fibrillation is associated with lower incidence of stroke and death: data from Swedish health registries. Eur Heart J 2016; 37 (31) 2478-2487
  • 3 Meyre PB, Blum S, Hennings E. et al. Bleeding and ischaemic events after first bleed in anticoagulated atrial fibrillation patients: risk and timing. Eur Heart J 2022; 43 (47) 4899-4908
  • 4 Hylek EM, Held C, Alexander JH. et al. Major bleeding in patients with atrial fibrillation receiving apixaban or warfarin: the ARISTOTLE Trial (Apixaban for Reduction in Stroke and Other Thromboembolic Events in Atrial Fibrillation): predictors, characteristics, and clinical outcomes. J Am Coll Cardiol 2014; 63 (20) 2141-2147
  • 5 Chew D, Piccini JP. Long-term oral anticoagulant after catheter ablation for atrial fibrillation. Europace 2021; 23 (08) 1157-1165
  • 6 Mark DB, Anstrom KJ, Sheng S. et al; CABANA Investigators. Effect of catheter ablation vs medical therapy on quality of life among patients with atrial fibrillation: the CABANA randomized clinical trial. JAMA 2019; 321 (13) 1275-1285
  • 7 Marrouche NF, Brachmann J, Andresen D. et al; CASTLE-AF Investigators. Catheter ablation for atrial fibrillation with heart failure. N Engl J Med 2018; 378 (05) 417-427
  • 8 Hindricks G, Potpara T, Dagres N. et al; ESC Scientific Document Group. 2020 ESC Guidelines for the diagnosis and management of atrial fibrillation developed in collaboration with the European Association for Cardio-Thoracic Surgery (EACTS): the Task Force for the diagnosis and management of atrial fibrillation of the European Society of Cardiology (ESC) developed with the special contribution of the European Heart Rhythm Association (EHRA) of the ESC. Eur Heart J 2021; 42 (05) 373-498
  • 9 Zhang Z, Zhu J, Wu M, Neidlin M, Wu WT, Wu P. Computational modeling of hemodynamics and risk of thrombosis in the left atrial appendage using patient-specific blood viscosity and boundary conditions at the mitral valve. Biomech Model Mechanobiol 2023; 22 (04) 1447-1457
  • 10 Nagao T, Higo S, Suzuki H. et al. Prospective comparison of periprocedural coagulation markers among uninterrupted anticoagulants for atrial fibrillation ablation. Heart Rhythm 2020; 17 (03) 391-397
  • 11 Akar JG, Jeske W, Wilber DJ. Acute onset human atrial fibrillation is associated with local cardiac platelet activation and endothelial dysfunction. J Am Coll Cardiol 2008; 51 (18) 1790-1793
  • 12 Berg DD, Ruff CT, Jarolim P. et al. Performance of the ABC scores for assessing the risk of stroke or systemic embolism and bleeding in patients with atrial fibrillation in ENGAGE AF-TIMI 48. Circulation 2019; 139 (06) 760-771
  • 13 Oyama K, Giugliano RP, Berg DD. et al. Serial assessment of biomarkers and the risk of stroke or systemic embolism and bleeding in patients with atrial fibrillation in the ENGAGE AF-TIMI 48 trial. Eur Heart J 2021; 42 (17) 1698-1706
  • 14 Hijazi Z, Lindahl B, Oldgren J. et al. Repeated measurements of cardiac biomarkers in atrial fibrillation and validation of the ABC stroke score over time. J Am Heart Assoc 2017; 6 (06) e004851
  • 15 Yu I, Song T-J, Kim BJ. et al. CHADS2, CHA2DS2-VASc, ATRIA, and Essen stroke risk scores in stroke with atrial fibrillation: a nationwide multicenter registry study. Medicine (Baltimore) 2021; 100 (03) e24000
  • 16 Pisters R, Lane DA, Nieuwlaat R, de Vos CB, Crijns HJ, Lip GY. A novel user-friendly score (HAS-BLED) to assess 1-year risk of major bleeding in patients with atrial fibrillation: the Euro Heart Survey. Chest 2010; 138 (05) 1093-1100
  • 17 Kosolwattana T, Liu C, Hu R, Han S, Chen H, Lin Y. A self-inspected adaptive SMOTE algorithm (SASMOTE) for highly imbalanced data classification in healthcare. BioData Min 2023; 16 (01) 15
  • 18 Bi X-A, Xing Z, Zhou W, Li L, Xu L. Pathogeny detection for mild cognitive impairment via weighted evolutionary random forest with brain imaging and genetic data. IEEE J Biomed Health Inform 2022; 26 (07) 3068-3079
  • 19 Zhang Y, Zhang X, Lane AN, Fan TW, Liu J. Inferring gene regulatory networks of metabolic enzymes using gradient boosted trees. IEEE J Biomed Health Inform 2020; 24 (05) 1528-1536
  • 20 Natekin A, Knoll A. Gradient boosting machines, a tutorial. Front Neurorobot 2013; 7: 21
  • 21 Bertsimas D, Mingardi L, Stellato B. Machine learning for real-time heart disease prediction. IEEE J Biomed Health Inform 2021; 25 (09) 3627-3637
  • 22 LaValley MP. Logistic regression. Circulation 2008; 117 (18) 2395-2399
  • 23 Zuo C, Qian J, Feng S. et al. Deep learning in optical metrology: a review. Light Sci Appl 2022; 11 (01) 39
  • 24 Fillbrunn A, Dietz C, Pfeuffer J, Rahn R, Landrum GA, Berthold MR. KNIME for reproducible cross-domain analysis of life science data. J Biotechnol 2017; 261: 149-156
  • 25 Ali S, Akhlaq F, Imran AS, Kastrati Z, Daudpota SM, Moosa M. The enlightening role of explainable artificial intelligence in medical & healthcare domains: a systematic literature review. Comput Biol Med 2023; 166: 107555
  • 26 Lu J, Hutchens R, Hung J. et al. Performance of multilabel machine learning models and risk stratification schemas for predicting stroke and bleeding risk in patients with non-valvular atrial fibrillation. Comput Biol Med 2022; 150: 106126
  • 27 Apostolakis S, Lane DA, Guo Y, Buller H, Lip GY. Performance of the HEMORR(2)HAGES, ATRIA, and HAS-BLED bleeding risk-prediction scores in patients with atrial fibrillation undergoing anticoagulation: the AMADEUS (evaluating the use of SR34006 compared to warfarin or acenocoumarol in patients with atrial fibrillation) study. J Am Coll Cardiol 2012; 60 (09) 861-867
  • 28 Zhao Y, Cao L-Y, Zhao Y-X. et al. Medical record data-enabled machine learning can enhance prediction of left atrial appendage thrombosis in nonvalvular atrial fibrillation. Thromb Res 2023; 223: 174-183
  • 29 Atehortúa A, Gkontra P, Camacho M. et al. Cardiometabolic risk estimation using exposome data and machine learning. Int J Med Inform 2023; 179: 105209
  • 30 Shwartz-Ziv R, Armon A. Tabular data: deep learning is not all you need. Inf Fusion 2022; 81: 84-90
  • 31 Prifti E, Fall A, Davogustto G. et al. Deep learning analysis of electrocardiogram for risk prediction of drug-induced arrhythmias and diagnosis of long QT syndrome. Eur Heart J 2021; 42 (38) 3948-3961
  • 32 Sideris S, Archontakis S, Latsios G. et al. Biomarkers associated with bleeding risk in the setting of atrial fibrillation. Curr Med Chem 2019; 26 (05) 824-836
  • 33 Shechter M, Merz CN, Paul-Labrador MJ, Kaul S. Blood glucose and platelet-dependent thrombosis in patients with coronary artery disease. J Am Coll Cardiol 2000; 35 (02) 300-307
  • 34 Hart R, Veenstra DL, Boudreau DM, Roth JA. Impact of body mass index and genetics on warfarin major bleeding outcomes in a community setting. Am J Med 2017; 130 (02) 222-228

Zoom Image
Fig. 1 Top 40 significant features of thrombosis and bleeding. Feature importance obtained by RF algorithm and XGBoost algorithm for (A) Thrombosis group by RF algorithm, (B) Bleeding group by RF algorithm, (C) Thrombosis group by XGBoost algorithm, and (D) Bleeding group by XGBoost algorithm. OAC_2, type of OAC in hospital after CA operation; OAC_3, type of initial after discharge (before adjustment); OAC_4, type of OAC after discharge (after adjustment). The label of dosage is same as OAC. T3, operating time; T4, the duration of heparin; T5, the total duration of OAC; T6, the duration of OAC after discharge (before adjustment); T7, the duration of OAC after discharge (after adjustment). Other abbreviations are mentioned in [Table 1].
Zoom Image
Fig. 2 SHAP values for thrombosis and bleeding by XGBoost algorithm. SHAP values obtained by XGBoost algorithm for (A) SHAP absolute value for thrombosis group, (B) SHAP summary plot for thrombosis group, (C) the SHAP absolute value for bleeding group, and (D) SHAP summary plot for bleeding group. OAC_2, type of OAC in hospital after CA operation; OAC_3, type of initial after discharge (before adjustment). The label of dosage is same as OAC. T3, operating time; T6, the duration of OAC after discharge (before adjustment); T7, the duration of OAC after discharge (after adjustment). Other abbreviations are mentioned in [Table 1].
Zoom Image
Fig. 3 Partial dependence plot for the top nine representative features for predicting thrombosis. The risk of thrombosis is influenced by (A) age and blood glucose (GLU), (B) the duration of OAC after discharge (before adjustment) and diabetes mellitus (DM), (C) BNP and left atrial appendage width (LAA-W), (D) total bile acids (TBA) and albumin (Alb), (E) fibrinogen (Fib) and type of OAC in hospital after CA operation, (F) Cr and Alb, (G) triglyceride (TG) and type of OAC in hospital after CA operation, (H) drinking and TBA, and (I) GLU and drinking. CA, catheter ablation.
Zoom Image
Fig. 4 Partial dependence plot for the top nine representative features for predicting bleeding. The risk of bleeding is influenced by (A) type of initial after discharge (before adjustment) and LAD, (B) ALT and BMI, (C) the dosage of initial after discharge (before adjustment) and EF, (D) high-density lipoprotein cholesterol (HDL_C) and Fib, (E) heparin dose and direct bilirubin (DBIL), (F) platelets (PLT) and type of initial after discharge (after adjustment), (G) the duration of OAC after discharge (after adjustment) and PLT, (H) BNP and heparin dose, and (I) uric acid (UA) and DBP. ALT, alanine transaminase; BMI, body mass index; LAD, left atrium diameter; OAC, oral anticoagulation.