Keywords nonvalvular atrial fibrillation - risk assessment - thrombosis-related adverse events
- bleeding-related events - anticoagulation strategy
Introduction
Perioperative and postoperative thrombosis-related adverse events (AEs; thrombosis)
are major complications of percutaneous catheter ablation (CA) for nonvalvular atrial
fibrillation (NVAF).[1 ]
[2 ] Oral anticoagulant (OAC) therapy is widely used to reduce the risk of thrombosis
but at the expense of increasing the risk of bleeding.[3 ]
[4 ] Therefore, assessing the risks and benefits of postablation OAC therapy in NVAF
is necessary to develop a rational and effective anticoagulation strategy that can
contribute towards averting AEs.
CA is a recommended treatment in restoring sinus rhythm in patients with symptomatic
drug-refractory NVAF, and several randomized controlled trials have shown that CA
is superior to antiarrhythmic drugs in reducing the recurrence and burden of atrial
fibrillation (AF).[5 ]
[6 ]
[7 ] Nevertheless, CA therapy is associated with perioperative bleeding and stroke in
1.9 and 0.2%, vascular complications in 2 to 4%, and asymptomatic cerebral embolism
in up to 15%.[8 ]
[9 ] Local endothelial dysfunction during the procedure is generally considered to be
the cause of platelet (PLT) activation, and damaged cells release thrombogenic components,
leading to an increased risk of thrombosis.[10 ] The dislodgement of thrombosis in the left atrial appendage (LAA), hypercoagulable
state due to prolonged bed rest, and recurrence of AF also increase the risk of embolism.[11 ]
There is a plethora of scores incorporating different risk factors to assess the risk
of stroke in patients with NVAF, including the CHADS2 score, CHA2 DS2 -VASc score, ATRIA score, and ABC score.[12 ]
[13 ]
[14 ] The CHA2 DS2 -VASc scoring stands out as the most widely employed stroke assessment tool in clinical
practice, owing to its inherent simplicity and pragmatic utility. Nevertheless, both
the CHADS2 score and CHA2 DS2 -VASc score encounter challenges when it comes to predicting vascular events.[15 ] Simultaneously, they are more likely to overestimate the risk of thrombosis due
to the inclusion of comorbidities. The HAS-BLED score, which takes into account factors
such as hypertension, abnormal liver/renal function, stroke, bleeding, unstable international
normalized ratio, older age, and drug or alcohol abuse, is widely used to assess the
risk of major bleeding.[16 ] The HAS-BLED score was more suitable for patients taking warfarin, but less suitable
for patients taking non-vitamin antagonist OACs because of the irregular monitoring
of coagulation function. Moreover, they failed to consider various novel influencing
factors and the analytical methods used that do not adequately reflect the significance
of predictive indicators. Furthermore, the influences of CA and OAC were not taken
into account, making it difficult to evaluate the efficiency of treatment strategies
in NVAF patients with CA therapy. With the advancement of artificial intelligence
technology and the emergence of novel biomarkers associated with NVAF, a novel prediction
model is established by integrating numerous innovative factors to assess the risk
of thrombosis and bleeding, which will provide precise support for the development
of enhanced anticoagulation strategies.
In this study, we first aimed to develop accurate prediction models to detect some
potential influential risk factors associated with thrombosis and bleeding with high
confidence. To achieve this, eight machine learning (ML) algorithms based on 76 variables
were evaluated to build reliable classification models. The second aim was to recognize
and rank all potential influential risk factors for thrombosis and bleeding by the
feature importance and SHapley Additive exPlanations (SHAP) methods. With important
predictors, simplified ML models were proposed to improve the feasibility of the tool.
At the same time, the associations and differences between thrombosis and bleeding
were compared . Finally, some strategies were proposed to avoid the risk of thrombosis
and bleeding for patients with NVAF.
Materials and Methods
Proposed Methodology
This study proposed an explainable ML-based integrating model for predicting NVAF
patients with the risk of thrombosis and bleeding. To achieve this, we performed the
following steps: first, the dataset was obtained, preprocessed, subsampled, and randomly
divided into a training set (80%) and a testing set (20%). Second, all features were
used to construct models by the eight ML algorithms, and the performance was evaluated
by accuracy, specificity, recall, and area under the curve (AUC) on the testing set.
Third, the first two algorithms with the best performance are selected for further
feature selection. Fourth, SHAP analysis was employed to identify predictors. Finally,
based on the feature importance and SHAP value ranking, an optimized model with variable
weights was constructed by score assignment.
Study Population and Dataset Preparation
In this study, patients with NVAF used to construct ML models were collected between
January 2015 and January 2023 in Southwest Hospital (the First Affiliated Hospital of Army Medical University) . The inclusion criteria were as follows: (1) patients who were discharged with a
primary diagnosis of NVAF, (2) patients who underwent CA therapy, and (3) patients
who were followed up for at least 6 months after ablation. The exclusion criteria
were as follows: (1) patients with moderate to severe mitral stenosis with AF, (2)
patients with AF after mechanical valve replacement (for aortic and mitral valve stenosis),
(3) patients who failed to attend follow-up appointments after CA or within 6 months
following surgery were excluded from the study. As a retrospective analysis, there
were some missing values for some information. Two strategies were employed to handle
missing values in characteristics. First, the variables are imputed with the most
recent admission index retrieved from the electronic medical record system. Then,
if the proportion of missing values exceeds 5%, that particular sample was excluded
from further analysis, and the remaining missing values in the variables were imputed
using the median value. Thrombosis events were defined as the occurrence of thrombosis
in LAA detected by transesophageal ultrasound, pulmonary embolism, transient ischemic
attack, and stroke after CA procedure. Bleeding events include both major and minor
bleeding events, including hematemesis, intracranial hemorrhage, hematochezia, hematuria,
gingival and nasal bleeding, subcutaneous ecchymosis, and hematoma. After processing,
1,055 patients with NVAF undergoing CA therapy were retained, which separately contained
105 and 252 data with thrombosis and bleeding, respectively. Ethical approval [(B)KY2023076]
was granted by the Ethics Committee of the First Affiliated Hospital of Army Medical
University prior to the commencement of this study, and the requirement for informed
consent was removed for a retrospective observational study.
Variable Selection and Preprocessing
Four different types of variables based on previous literature and preliminary findings
were used to construct multiple classification models: (1) 20 baseline information
components including demographic information for age, sex, and weight; lifestyle factors
for smoking, drinking, and past medical history; indicators related to surgical procedures
including days of hospitalization, operation duration, intraoperative active coagulation
time, and other baseline relating indicators; (2) 31 medication-related factors including
preoperative utilization of OAC (type, dosage, preoperative interruption), combination
therapies (nonsteroidal anti-inflammatory drugs [NSAIDs], statins, P-glycoprotein
inhibitors, proton pump inhibitors, beta blockers), intraoperative administration
of heparin, postoperative usage of OAC, and combination therapies (same as preoperative);
(3) 24 biomarker information components including kidney functions, hepatic functions,
complete blood count, inflammation, coagulation function, and myocardial function
indicators; (4) 5 left atrium–related indictors including width, depth of LAA, left
atrium diameter (LAD), left ventricular shortening fraction and ejection fraction
were collected to develop and interpret models. All data except for medication information
were collected through medical data retrieval and application platforms , while medication-related information was collected by manual review of the medical
records. For these features, two pretreatments were performed to delete some uninformative
features before further selection: (1) remove features with a variance of 0 or close
to 0; (2) calculate the linear correlation of all variables with a correlation threshold
of 0.95. Herein, 76 variables were used to construct classification models with inflammatory
indicators and serum cystatin C excluded.
The baseline variables were presented as mean ± standard deviation for normal continuous
variables and count (percentage) for categorical variables. Nonnormally continuous
variables were presented as median (interquartile range). An unpaired t -test was used to compare the differences between groups for normally continuous variables,
and the Mann–Whitney U test was used to analyze differences between nonnormally distributed
continuous variables. Pearson's chi-squared test or Fisher's exact test was executed
to examine the association between categorical predictors and AEs. A two-sided p- value <0.05 was considered significant.
Data Split, Model Development, and Process
As the unbalanced distribution of positive and negative sets, subsampling techniques
were used to balance the predictions of positive and negative sets.[17 ] For the subsampling strategy, randomly selecting samples from the negative set that
are identical to the positive set and repeating it 50 times will yield reliable predictions.
This approach aims to deal with the data imbalance to achieve high recall and reasonable
precision for maintaining professional standards. The dataset was divided with 80%
for the training set and 20% for the test set using randomly stratified sampling.
Similarly, the data splitting process was iterated 50 times to ensure the utilization
of all available data for modeling.
Eight popular ML algorithms were used to develop the classification models, including
Random Forest (RF),[18 ] light gradient boosting machine (LGBM), Gradient boosting decision tree (GBT),[19 ]
[20 ] eXtreme Gradient Boosting (XGBoost),[21 ] Naïve Bayes (NB), logistic regression (LR),[22 ] neural network (NN), and deep neural network (DNN).[23 ] RF, GBT, NN, NB, and XGBoost were implemented in the KNIME analytics platform (KNIME version 5.1),[24 ] and LR, LGBM, and DNN were implemented in the Keras package of Python (version 3.7).
The specific parameters of the model are shown in [Supplementary Table S1 ] (available in the online version).
Performance Evaluation and Model Explainability
In this study, the performance of the classification models was evaluated by the following
statistical parameters: accuracy (ACC) and AUC. In addition, the recall values were
calculated to assess the recovery rate of positive samples, namely, the probability
of correctly predicting the occurrence of AE. Feature importance was evaluated by
the best-performing model to interpret the importance of different variables. To further
provide consistent and locally accurate attribution values for each variable, the
SHAP method was used to explain how feature influences the overall importance of models.[25 ] The absolute SHAP value represents the global contribution of the prediction by
summing up all SHAP values. Predictors with a positive SHAP score help predict thrombosis
or bleeding patients in the model, while predictions with a negative SHAP score help
predict non-AE patients.
Results
Baseline Characteristics
Of 2,146 participants with NVAF, the incidence of thrombosis was 8.81% (189 out of
2,146) and the incidence of bleeding was 13.89% (298 out of 2,146). Due to data preprocessing
principles described above, the data for the two groups were not identical. For the
thrombosis group, the positive set contained 105 data points and the negative set
contained 942 data points. For the bleeding group, the positive set contained 252
data points and the negative set contained 803 data points. The baseline characteristics
of the participants are shown in [Table 1 ] and the detailed distributions are shown in [Supplementary Table S2 ] (available in the online version). The age, B-type natriuretic peptide (BNP) level,
heparin dose, and length of stay in the AE group were statistically different from
those in the non-AE group shown in [Supplementary Fig. S1 ] (available in the online version). These results indicated the considerable potential
of demographic to distinguish AE in the population with NVAF.
Table 1
Statistical summary of the clinical variables in the AE group and non-AE group
Categories
Variables
Thrombosis (105)
Nonthrombosis (942)
p
Bleeding (252)
Nonbleeding (803)
p
Demographic information
Age, y, mean ± SD
65.81 ± 10.52
60 ± 11.64
<0.001[a ]
62.69 ± 11.10
60.36 ± 11.73
0.006[a ]
Gender, male, n (%)
59 (56.19)
512 (54.35)
0.720[b ]
121 (48.13)
481 (56.39)
0.018[b ]
Weight, kg[* ]
64.23 ± 11.24
64.79 ± 11.10
0.623[a ]
64 (56–70)
65 (56–71)
0.350[c ]
BMI, kg/m2 , mean ± SD
24.46 ± 3.20
24.51 ± 3.37
0.873[a ]
24.58 ± 3.19
24.47 ± 3.40
0.640[a ]
Smoking, n (%)
38 (36.19)
310 (32.91)
0.009[b ]
72 (28.57)
278 (34.62)
0.070[b ]
Drinking, n (%)
42 (40.00)
292 (31.0)
0.060[b ]
63 (25.00)
274 (34.12)
0.007[b ]
Medical history
Surgical history, n (%)
47 (44.76)
332 (35.24)
0.054[b ]
110 (43.65)
274 (34.12)
0.006[b ]
DM, n (%)
19 (18.10)
128 (13.59)
0.207[b ]
45 (17.86)
103 (12.83)
0.045[b ]
Hypertension, n (%)
50 (47.62)
424 (45.01)
0.108[c ]
139 (55.16)
346 (43.09)
0.001[c ]
Hyperlipidemia, n (%)
23 (21.90)
200 (21.23)
0.873[b ]
44 (17.46)
181 (22.54)
0.086[b ]
AS, n (%)
77 (73.33)
594 (63.06)
0.037[b ]
167 (66.27)
509 (63.39)
0.405[b ]
CIS, n (%)
18 (17.14)
56 (5.94)
<0.001[b ]
22 (8.73)
52 (6.48)
0.221[b ]
HF, n (%)
32 (30.48)
220 (23.35)
0.076[c ]
70 (27.78)
183 (22.79)
0.157[c ]
AF type
PeAF, n (%)
62 (59.05)
646 (68.58)
0.141[b ]
167 (66.27)
546 (68.00)
0.857[b ]
First episode AF, n (%)
38 (36.19)
261 (27.71)
74 (29.37)
226 (28.14)
Left cardiac structure and function
LAA-W, mm, mean ± SD
16.98 ± 2.70
16.64 ± 2.31
0.154[a ]
16.75 ± 2.39
16.64 ± 2.34
0.529[a ]
LAA-D, mm, mean ± SD
25.11 ± 3.92
24.99 ± 3.53
0.734[a ]
25.14 ± 3.87
24.95 ± 3.47
0.460[a ]
LAD, mm, mean ± SD
39.63 ± 5.05
38.09 ± 5.66
0.008[a ]
38.91 ± 5.47
38.05 ± 5.65
0.035[a ]
EF, mean ± SD
60.40 ± 8.47
60.83 ± 7.62
0.589[a ]
60.49 ± 7.03
60.90 ± 7.88
0.458[a ]
FS, mean ± SD
32.47 ± 5.72
32.99 ± 5.67
0.368[a ]
32.83 ± 6.12
32.99 ± 5.51
0.711[a ]
Laboratory test
GLU, mmol/L, mean ± SD
5.66 ± 1.42
5.52 ± 1.62
0.399[a ]
5.56 ± 1.32
5.52 ± 1.67
0.795[a ]
PLT, 109 /L, mean ± SD
185.34 ± 63.147
189.49 ± 58.05
0.492[a ]
181.81 ± 58.17
191.33 ± 58.32
0.024[a ]
UA, μmol/L, mean ± SD
371.20 ± 96.12
354.20 ± 105.47
0.114[a ]
358.51 ± 106.37
355.03 ± 104.14
0.645[a ]
Alb, g/L, mean ± SD
38.91 ± 3.67
39.47 ± 3.41
0.112[a ]
39.08 ± 3.39
39.51 ± 3.47
0.087[a ]
ALT, IU/L[* ]
20.4 (13.3–30.6)
19.8 (14.2–28.8)
0.996[c ]
23.75 ± 18.53
25.70 ± 21.58
0.197[a ]
AST, IU/L[* ]
23.1 (19.3–29.1)
22.4 (18.7–27.4)
0.233[c ]
25.37 ± 14.00
25.70 ± 13.66
0.733[a ]
TBA, μmol/L[* ]
3.6 (2.6–6.8)
4.3 (2.6–7.1)
0.389[c ]
5.86 ± 6.14
6.06 ± 7.88
0.709[a ]
Creatinine, μmol/L[* ]
77.72 ± 18.60
75.00 ± 36.02
0.445[a ]
70.0 (61.7–81.5)
72.1 (61.2–84.2)
0.374[c ]
PT-INR[* ]
1.05 ± 0.29
1.04 ± 0.69
0.860[a ]
0.96 (0.91–1.04)
0.94 (0.89–1.01)
0.002[c ]
Fib, g/L[* ]
2.55 (2.07–3.10)
2.44 (2.12–2.88)
0.126[c ]
2.59 ± 0.76
2.57 ± 0.69
0.674[a ]
LDL_C, mmol/L[* ]
2.69 (2.07–3.24)
2.70 (2.18–3.18)
0.846[c ]
2.68 ± 0.77
2.72 ± 0.74
0.425[a ]
HDL_C, mmol/L[* ]
1.18 ± 0.33
1.17 ± 0.30
0.750[a ]
1.12 (0.98–1.28)
1.13 (0.97–1.34)
0.363[c ]
TG, mmol/L, mean ± SD
1.48 ± 1.20
1.59 ± 1.12
0.377[a ]
1.50 ± 0.99
1.60 ± 1.17
0.233[a ]
Tch, mmol/L[* ]
4.15 (3.54–5.04)
4.34 (3.63–5.03)
0.856[c ]
4.27 ± 1.02
4.39 ± 1.02
0.090[a ]
BNP, pg/mL, median (IQR)
169 (83–280)
117 (35–169)
<0.001[c ]
154 (59–176)
116 (34–169)
0.002[c ]
PCT, ng/mL, median (IQR)
0.22 (0.20–0.29)
0.22 (0.19–0.27)
0.501[c ]
0.23 (0.19–0.30)
0.22 (0.19–0.28)
0.065[c ]
CA
Length of stay, day[* ]
9 (7–11)
8 (6–10)
0.025[a ]
9.1 ± 3.3
8.5 ± 3.4
0.028[a ]
Heparin dose, IU, mean ± SD
6,597 ± 1,751
7,238 ± 1,899
<0.001[a ]
6,626 ± 1,871
7,341 ± 1,868
<0.001[a ]
T4, day[* ]
2 (1–4)
1 (1–3)
0.003[c ]
2.42 ± 2.01
2.04 ± 1.94
0.008[a ]
T3, h[* ]
3.26 ± 0.85
3.24 ± 0.84
0.751[a ]
3.17 (2.75–3.65)
3.17 (2.67–3.67)
0.265[c ]
OACs
OAC_1, n (%)
42 (40.0)
316 (33.55)
0.007[b ]
103 (40.87)
257 (32.00)
<0.001[b ]
Preoperative combined medication, n (%)
85 (80.95)
746 (78.53)
0.036[d ]
198 (78.57)
583 (72.60)
0.161[d ]
Interruption, day, n (%)
25 (23.81)
161 (17.09)
0.005[b ]
50 (19.84)
138 (17.19)
0.855[b ]
Postoperative NOACs, n (%)
102 (97.14)
790 (83.86)
<0.001[b ]
190 (75.40)
713 (88.79)
<0.001[b ]
T6, mo, median (IQR)
2.2 (1–4)
2.5 (1–3.2)
0.991[c ]
3 (1–4)
2 (1–3)
0.013[c ]
Adjustment, n (%)
86 (81.90)
282 (29.94)
0.164[d ]
92 (36.51)
230 (28.64)
0.012[b ]
T7 >3 mo, n (%)
23 (21.90)
140 (14.86)
0.001[b ]
54 (21.43)
109 (13.57)
0.002[c ]
T5 >3 mo, n (%)
61 (58.10)
533 (56.58)
0.002[b ]
173 (68.65)
427 (53.18)
<0.001[b ]
Abbreviations: ACT, activated clotting time of whole blood; Alb, albumin; ALT, alanine
transaminase; AS, atherosclerosis; AST, aspartate transaminase; BMI, body mass index;
BNP, B-type natriuretic peptide; CA, catheter ablation; CIS, chronic myocardial ischemia
syndrome; DM, diabetes mellitus; EF, ejection fraction; Fib, fibrinogen; First episode
AF, first episode atrial fibrillation; FS, fractional shortening; GLU, glucose; HDL_C,
high-density lipoprotein cholesterol; HF, heart failure; LAA-D, left atrial appendage
depth; LAA-W, left atrial appendage width; LAD, left atrial dimension; LDL_C, low-density
lipoprotein cholesterol; NOACs, non-vitamin antagonist oral anticoagulants; OAC_1,
OAC before CA operation; OACs, oral anticoagulants; PCT, procalcitonin; PeAF, persistent
atrial fibrillation; PLT, platelets; PT-INR, prothrombin time international normalized
ratio; SD, standard deviation; T3, operating time; T4, the duration of heparin; TBA,
total bile acids; Tch, total cholesterol; TG, triglyceride; TT, thrombin time; UA,
uric acid; Interruption, the proportion of OAC was started the day after CA therapy;
T6, the duration of OAC after discharge (before adjustment); T7, proportion of the
duration of OAC >3 months on the adjustment; T5, proportion of the total duration
of OAC >3 months.
* The two target columns do not all conform to a normal distribution because the data
are not the same. For example, in the thrombosis group, “weight” was homogeneous and
a t -test was used and presented as mean ± standard deviation, whereas in the bleeding
group, “weight” was not homogeneous and a nonparametric test was chosen and presented
as median (25%–75%).
a Independent sample t -test.
b Pearson χ2 -test.
c Wilcoxon–Mann–Whitney test.
d Fisher's exact test.
Comparison of the Performances of ML Methods on Prediction of Thrombosis and Bleeding
Following CA for Patients with NVAF
The performance of multiple ML methods including RF, XGBoost, GBT, LightGBM, NB, LR,
NN, and DNN is shown in [Table 2 ] and [Supplementary Fig. S2 ] (available in the online version). Both the XGBoost and RF models achieved better
performance in respect of recalls at 0.841 and 0.777, and AUC scores at 0.854 and
0.795 for the predictions of thrombosis. Based on the average ACC values, although
the NB model has the highest overall accuracy and performed well in predicting nonthrombosis
samples, it actually achieved high accuracy in prediction at the expense of the recovery
of a positive set. For the predictions of bleeding, the overall predictions were remarkably
similar to the model results of thrombosis. Furthermore, the accuracies fall short
compared to that of thrombosis. For instance, the recall for thrombosis predicted
by the RF algorithm stands at 0.777, while the recall for bleeding was only 0.642.
To maximize the identification of high risk of thrombosis and bleeding, the XGBoost
and RF algorithms were selected for further feature selection and construction of
optimized models.
Table 2
Performance of the models based on different ML algorithms
Model
Recall
Specificity
BA
ACC
AUC
Thrombosis
RF
0.777 (0.018)
0.653 (0.009)
0.715 (0.010)
0.664 (0.009)
0.795 (0.006)
XGBoost
0.841 (0.015)
0.731 (0.010)
0.786 (0.009)
0.740 (0.009)
0.854 (0.005)
GBT
0.626 (0.039)
0.717 (0.010)
0.672 (0.020)
0.708 (0.010)
0.730 (0.011)
LightGBM
0.731 (0.019)
0.640 (0.010)
0.685 (0.010)
0.650 (0.010)
0.720 (0.006)
NB
0.453 (0.029)
0.920 (0.010)
0.687 (0.014)
0.887 (0.009)
0.887 (0.007)
LR
0.672 (0.034)
0.721 (0.017)
0.697 (0.017)
0.714 (0.014)
0.778 (0.009)
NN
0.655 (0.029)
0.716 (0.016)
0.685 (0.015)
0.706 (0.014)
0.722 (0.015)
DNN
0.730 (0.018)
0.654 (0.012)
0.692 (0.010)
0.663 (0.011)
0.737 (0.006)
Bleeding
RF
0.642 (0.007)
0.799 (0.007)
0.721 (0.005)
0.764 (0.006)
0.800 (0.002)
XGBoost
0.744 (0.009)
0.791 (0.006)
0.767 (0.004)
0.781 (0.004)
0.864 (0.002)
GBT
0.644 (0.015)
0.823 (0.012)
0.734 (0.010)
0.786 (0.010)
0.837 (0.005)
LightGBM
0.576 (0.009)
0.805 (0.009)
0.691 (0.006)
0.747 (0.006)
0.721 (0.003)
NB
0.548 (0.011)
0.964 (0.004)
0.756 (0.006)
0.910 (0.003)
0.925 (0.002)
LR
0.668 (0.016)
0.734 (0.008)
0.701 (0.009)
0.719 (0.008)
0.800 (0.004)
NN
0.671 (0.016)
0.674 (0.018)
0.672 (0.009)
0.673 (0.012)
0.744 (0.009)
DNN
0.617 (0.014)
0.700 (0.009)
0.659 (0.009)
0.679 (0.008)
0.712 (0.004)
Note: The values in bold is the maximum value in this measure.
Abbreviations: ACC, accuracy; AUC, area under the curve.
Feature Importance for Predicting Thrombosis and Bleeding
To acquire more accurate and concise features, both RF and XGB algorithms have been
chosen to conduct feature significance analysis for feature selection as shown in
[Fig. 1 ]. The higher the value of feature importance, the more crucial the feature in ML
models. Regarding the importance of features in the prediction of thrombosis by two
ML algorithms, the top 10 features showed a remarkable lack of consistency, with the
exception being the time of intraoperative heparin application (T4), which demonstrated
a robust feature for thrombosis. In predicting bleeding, the application of OAC is
the most significant feature in both algorithms. The anticoagulation strategy, sex,
BNP, total cholesterol (Tch), and triglyceride (TG) level are significant contributors
to both thrombosis and bleeding. Aside from this, beta blocker, NSAID, chronic myocardial
ischemia syndrome, and age also play a critical role for thrombosis, while preoperative
co-medication, diabetes mellitus, LAD, PLT, creatinine, and BNP level also significantly
contribute to the prediction of bleedings.
Fig. 1 Top 40 significant features of thrombosis and bleeding. Feature importance obtained
by RF algorithm and XGBoost algorithm for (A ) Thrombosis group by RF algorithm, (B ) Bleeding group by RF algorithm, (C ) Thrombosis group by XGBoost algorithm, and (D ) Bleeding group by XGBoost algorithm. OAC_2, type of OAC in hospital after CA operation;
OAC_3, type of initial after discharge (before adjustment); OAC_4, type of OAC after
discharge (after adjustment). The label of dosage is same as OAC. T3, operating time;
T4, the duration of heparin; T5, the total duration of OAC; T6, the duration of OAC
after discharge (before adjustment); T7, the duration of OAC after discharge (after
adjustment). Other abbreviations are mentioned in [Table 1 ].
The feature significance obtained by different algorithms varies significantly, and
we further elaborated the influence of various features using the SHAP XGBoost-based
method shown in [Fig. 2 ]. The higher the SHAP absolute value of a variable, the larger the contribution to
the model. The results in [Fig. 2 ] show that age, the duration of OAC after discharge (T6), and BNP level are associated
with a higher risk of thrombosis, while type and dosage of OAC and alanine transaminase
(ALT) exhibited a stronger correlation with high-risk bleeding. The SHAP values of
the top 20 indicators for each sample are represented by a color gradient, with lighter
shades of golden yellow indicating higher SHAP values and darker shades of blue-purple
representing lower SHAP values. A threshold was set at a SHAP value of 0, with positive
contributions to the model above this value indicating higher risk, and negative contributions
below 0 indicating lower risk. Moreover, the direction of the effect of top features
on thrombosis and bleeding is shown in [Figs. 3 ] and [4 ]. Patients with older age (age > 75), the shorter of the duration of OAC (T6 < 1
month), higher BNP level (BNP > 500 pg/mL), the higher fibrinogen level (Fib > 3 g/L),
lower TG level (TG < 1 mmol/L), and alcohol consumption are more likely to develop
thrombosis. Meanwhile, long-term use of OAC (T7 > 5 months), lower PLT level (<200 × 109 /L), and higher UA level (>400 μmol/L) are more likely to result in bleeding.
Fig. 2 SHAP values for thrombosis and bleeding by XGBoost algorithm. SHAP values obtained
by XGBoost algorithm for (A ) SHAP absolute value for thrombosis group, (B ) SHAP summary plot for thrombosis group, (C ) the SHAP absolute value for bleeding group, and (D ) SHAP summary plot for bleeding group. OAC_2, type of OAC in hospital after CA operation;
OAC_3, type of initial after discharge (before adjustment). The label of dosage is
same as OAC. T3, operating time; T6, the duration of OAC after discharge (before adjustment);
T7, the duration of OAC after discharge (after adjustment). Other abbreviations are
mentioned in [Table 1 ].
Fig. 3 Partial dependence plot for the top nine representative features for predicting thrombosis.
The risk of thrombosis is influenced by (A ) age and blood glucose (GLU), (B ) the duration of OAC after discharge (before adjustment) and diabetes mellitus (DM),
(C ) BNP and left atrial appendage width (LAA-W), (D ) total bile acids (TBA) and albumin (Alb), (E ) fibrinogen (Fib) and type of OAC in hospital after CA operation, (F ) Cr and Alb, (G ) triglyceride (TG) and type of OAC in hospital after CA operation, (H ) drinking and TBA, and (I ) GLU and drinking. CA, catheter ablation.
Fig. 4 Partial dependence plot for the top nine representative features for predicting bleeding.
The risk of bleeding is influenced by (A ) type of initial after discharge (before adjustment) and LAD, (B ) ALT and BMI, (C ) the dosage of initial after discharge (before adjustment) and EF, (D ) high-density lipoprotein cholesterol (HDL_C) and Fib, (E ) heparin dose and direct bilirubin (DBIL), (F ) platelets (PLT) and type of initial after discharge (after adjustment), (G ) the duration of OAC after discharge (after adjustment) and PLT, (H ) BNP and heparin dose, and (I ) uric acid (UA) and DBP. ALT, alanine transaminase; BMI, body mass index; LAD, left
atrium diameter; OAC, oral anticoagulation.
Feature Selection for Constructing the Final ML Models
Based on the results of SHAP method and feature importance, we further refined features
and reconstruct the final prediction models. The feature was awarded 1 point if it
appeared in the top 20 of both algorithms, and 2 points if it was in the top 10 for
both algorithms. If a feature ranked in the top 10 of only one algorithm, it was awarded
0.5 points. Additionally, 1 point was given if the feature also appeared in the SHAP
results (top 20). Furthermore, a feature was awarded 1 point if it appeared in the
top 20 of only one algorithm but also appeared in the SHAP results (top 20). Each
variable can be assigned up to a maximum of 3 points based on the above criteria.
Finally, the features were sorted in descending order according to their scores and
included as predictor variables, with the specific results shown in [Table 3 ] and the specific processing process shown in [Supplementary Table S3 ] (available in the online version). A total of 25 features were retained for the
model of thrombosis and 27 features were selected for the model of bleeding. The process
of remodeling was the same as above, with XGBoost and RF algorithms selected and features
weighted according to the voting scores. Specific modeling results are shown in [Table 4 ]. In addition to the thrombosis model using XGBoost algorithm had some decline, the
accuracies of other models were stable and even better than those of previous models,
so an RF-based model for thrombosis (RF-T) and an XGBoost_w-based model for bleeding
(Xw-B) models were used for the final model construction and prediction.
Table 3
The subset of features for the final model for thrombosis and bleeding
Thrombosis
Bleeding
Feature
Score
Feature
Score
Age
2
OAC_3[a ]
3
T4
2
OAC dosage_3[a ]
2
TBA
1.5
Tch
2
ALT
1.5
T6
1.5
GLU
1.5
BNP
1.5
BNP
1.5
Alb
1.5
Fib
1.5
TG
1.5
LDL_C
1.5
PLT
1.5
PLT
1
T7
1.5
Interruption
1
Cr
1.5
TG
1
PCT
1
AST
1
UA
1
T3
1
Fib
1
EF
1
AST
1
Alb
1
GLU
1
Beta blocker_1[b ]
0.5
ALT
1
Sex
0.5
TT
1
Drug adjustment
0.5
LAD
1
OAC dosage_1[b ]
0.5
PPI_1[b ]
0.5
NSAID_2[a ]
0.5
Diabetes
0.5
CIS
0.5
Sex
0.5
T7
0.5
Drug adjustment
0.5
PPI_2[a ]
0.5
Interruption
0.5
Cr
0.5
LDL_C
0.5
Drinking
0.5
HDL_C
0.5
Heparin dose
0.5
OAC dosage_1[b ]
0.5
Abbreviations: T4, the duration of heparin; T3, operating time; OAC_3, type of OAC
after discharge (before adjustment); T6, the duration of OAC after discharge (before
OAC regimen adjustment); T7, the duration of OAC after discharge (after OAC regimen
adjustment).
Note: The variables in bold are those present in both models.
a CA preoperation.
b Postoperative CA.
Table 4
Performances of the simplified models based on two algorithms
Model
Recall
Specificity
BA
ACC
AUC
Thrombosis
XGBoost
0.731 (0.025)
0.706 (0.010)
0.718 (0.014)
0.708 (0.010)
0.798
XGBoost_w
0.730 (0.022)
0.680 (0.008)
0.705 (0.012)
0.685 (0.008)
0.773
RF
0.774 (0.015)
0.673 (0.010)
0.724 (0.001)
0.683 (0.001)
0.799
RF_w
0.720 (0.024)
0.693 (0.008)
0.707 (0.012)
0.696 (0.007)
0.794
Bleeding
XGBoost
0.773 (0.013)
0.806 (0.009)
0.790 (0.007)
0.799 (0.007)
0.885
XGBoost_w
0.780 (0.014)
0.805 (0.007)
0.792 (0.006)
0.800 (0.005)
0.890
RF
0.761 (0.011)
0.705 (0.008)
0.733 (0.006)
0.717 (0.006)
0.833
RF_w
0.711 (0.011)
0.821 (0.006)
0.766 (0.004)
0.797 (0.004)
0.872
Note: The values in bold is the maximum value in this measure.
Abbreviations: ACC, accuracy; AUC, area under the curve; RF_w, RF algorithm using
features weighted; XGBoost_w, XGBoost algorithm using features weighted.
Discussion
The Performances of Multiple ML Models for Thrombosis and Bleeding
Considering that the management of OAC following CA is crucial to reduce the risk
of thrombosis and bleeding, eight ML models were developed to detect risk factors
for thrombosis and bleeding in NVAF patients using 76 features. By comparing eight
commonly used ML algorithms, the XGBoost and RF-based models were the most powerful
in evaluating the importance of each factor in predicting AE. Our statistical measures
of performance scores (AUC) exist some discrepancies with the results of other studies.
In a cohort study of 9,670 patients, the AUC of the model for predicting ischemic
stroke based on GBT algorithm reached 0.685,[26 ] while our model for predicting thrombosis based on GBT algorithm was 0.730. The
AUCs for bleeding risk prediction ranged from 0.57 to 0.61 in a cohort study, while
our result can reached 0.876.[27 ] There may be three reasons for the difference in model accuracy. The first is that
the data come from different sources. The second is the choice of variables, which
included a total of 43 variables, but 42 of them are binary variables. Binary variables
provide very limited information, which may be the main reason for the low AUCs of
their research. In addition, the indicator for assessing model was different. In our
model, a recall value was used for hyperparameter tuning, but both precision and accuracy
were reduced to varying degrees. Nonetheless, the performances were consistent in
that the ML models achieved a better performance in prediction of the long-term risk
of thrombosis and bleeding compared to CHA2 DS2 -VASc and HAS-BLED risk scores. Even compared to our previous study, the AUCs for
the risk of LAA thrombosis ranged from 0.889 to 0.897.[28 ] From a holistic perspective, the overall accuracies of bleeding were inferior to
that of the thrombosis model, and the bleeding model has significantly higher specificity.
To achieve a more streamlined model so that the clinical use of a minimum number of
clinical indicators can directly predict risks, we achieved comparable performance
with AUCs of 0.799 and 0.890 for RF-T and Xw-B models, respectively. Consequently,
the RF-T model includes 25 features and Xw-B model includes 27 features from several
different categories without including any clinical information that might be expensive,
tedious, and time-consuming to acquire. The results also showed that XGBoost and RF
algorithms were widely recognized for their efficiency and effectiveness in a variety
of scenarios, outperforming other algorithms. The tree-based ensemble learning methods
also provide built-in feature importance estimates that recognize the most impactful
features in intricate exposure datasets.[29 ]
[30 ] While deep learning significantly improves model accuracy in learning tasks such
as image classification and electrocardiogram analysis, its performance was not outstanding
in small data models.[31 ] Therefore, this also suggested that the tree-based ensemble learning algorithms
may be the first choice when the amount of data is not large and without imaging data.
The Differences and Correlations between Thrombosis and Bleeding Events
To further explore the contribution of different risk factors to models, feature importance
and SHAP analysis were carried out. Interestingly, BNP levels were identified as top-ranked
predictors in both models by SHAP analysis. In the thrombosis model, BNP ranked third,
with an increased risk of thrombosis at levels above 500 pg/mL. The BNP ranked eighth
in bleeding model. Although the thresholds are not clear, the overall trend is consistent.
The higher the level of BNP, the higher the risk of bleeding. Consistent with the
study of ARISTOTLE cohort, NT-proBNP levels in patients with AF are associated not
only with ischemic stroke risk, but also with bleeding risk.[32 ] Furthermore, in our previous research of LAA thrombosis, the plasma BNP level was
significantly higher (BNP level > 400 pg/mL) in patients with LAA thrombosis than
in those without LAA thrombosis.[28 ] In the SHAP results, older age with higher blood glucose level in patients with
NVAF was prone to thrombosis, which is consistent with another research that fasting
blood glucose was reported to be an independent predictor of PLT-dependent thrombosis
in stable coronary artery disease patients.[33 ] In addition, patients with NVAF and diabetes are more likely to develop thrombosis
when OAC is used for less than 1 month (T6). In terms of predicting bleeding, patients
on warfarin with wider LAD are more likely to be at higher risk of bleeding. Consistent
with the findings of Lu et al, both emphasize the use of OAC as the most important
risk factor in bleeding events.[26 ] The ALT levels ranged from 13 to 31 IU/L across the quartiles from the collected
dataset. No increased risk of bleeding was observed in patients with slightly elevated
ALT levels. This finding may indicate that patients with NVAF with mild abnormalities
in liver enzymes do not need to be overly concerned about the risk of bleeding. Patients
with a high body mass index level who received warfarin had a lower risk of major
bleeding compared with normal patients. This could be attributable to the dose of
warfarin given to obese patients that is not adjusted for their weight on an individual
basis. It could also be because the dose was not adjusted for long-term weight gain.[34 ]
Based on the voting strategy in combination with the feature importance and SHAP analysis,
the most important influencing factors are clarified. Among the top 25 significant
risk factors for thrombosis, top 8 were the most predictive with age, the duration
of heparin, total bile acids (TBA), ALT, blood glucose, the level of BNP, fibrinogen,
and low-density lipoprotein cholesterol (LDL_C). Among the top 27 significant risk
factors for bleeding, the model identified that top 10 were the most predictive, including
OAC category, dosage and duration, PLT, BNP, blood lipid, and albumin level. In general,
only BNP levels were significant predictors of both thrombosis and bleeding events.
In NVAF patients undergoing CA therapy, the risk of thrombosis is more commonly associated
with advanced age, TBA, and the duration of heparin, whereas the risk of bleeding
is more dependent on the choice of anticoagulation regimen and coagulation indicators.
Limitations and Future Study
Limitations and Future Study
Several limitations of the study are worth mentioning. The dataset used to build the
model was derived from one center covering 1,100 patients, which may unintentionally
over- or underestimate the risks. The feasibility and extensibility of the results
need to be verified in future studies with larger samples. Several potential features
such as inflammation indicators and cardiac troponin with more than 30% missing values
were excluded from our study. However, in many studies inflammatory markers play an
important predictive role in adverse outcomes in patients with NVAF. The improvement
of our ML models could be more significant with this additional information. More
importantly, we only focused on whether the patient had an event during the follow-up
period, and if it occurred, it was used as an endpoint, and we did not consider whether
there were subsequent adjustments to the anticoagulation strategy. Hence, we will
continue to follow this population to reduce the risk of AEs by adjusting the anticoagulation
strategy to optimize the prediction model. Although the model that we have constructed
can predict risk and be used to warn and alert potentially high-risk populations,
it is currently not suitable for direct application in clinical scoring due to the
excessive number of clinical features. Therefore, the next step is to enlarge the
dataset, construct assessment scales, and integrate them into a web-based platform
that can directly assist health care professionals in risk assessment of patients
with NVAF.
Conclusion
In this study, we evaluated and compared eight ML algorithms in the detection of risk
factors of both thrombosis and bleeding. The final models, RF-T and the Xw-B, were
able to identify high-risk NVAF patients suffering from potential thrombosis and bleeding
based on a few easy-to-find features. We also identified that age, TBA, and BNP level
are crucial in predicting thrombosis, while anticoagulation regimen, coagulation indictors,
and BNP level were most predictive of bleeding. In summary, this study provides clinical
evidence–based advice to optimize the anticoagulation strategy for NVAF patients and
is of great significance for the prevention of thrombosis and bleeding-related events.
What is known about this topic?
Inappropriate use of oral anticoagulants (OACs) for nonvalvular atrial fibrillation
(NVAF) not only fails to prevent thrombosis, but is also associated with a higher
risk of bleeding.
This study aimed to develop clinical data-driven machine learning methods to dynamically
predict thrombosis and bleeding to develop more refined OAC treatment strategies for
AF patients.
What does this paper add?
The simplified machine learning models RF-T and Xw-B have better prediction performance
for thrombosis and bleeding, and the overall accuracy (AUC) reaches 0.799 and 0.890,
respectively.
The duration of heparin and BNP level are closely related to the risk of thrombosis,
while the administration strategy of OAC, the level of PLT, and BNP play a crucial
role in the occurrence of bleeding.