Keywords
social determinants of health - ACG risk score - provider workload - burnout - machine
learning
Background and Significance
Background and Significance
Provider burnout due to workload has emerged as a significant issue within the medical
care system. In the United States, studies have reported alarming rates of burnout
among physicians, with approximately 50% experiencing burnout at some point in their
careers.[1] Burnout can have severe consequences for both providers and patients. Reports indicate
that as many as 400 U.S. physicians die by suicide every year due to burnout's detrimental
impact on their lives.[2] Moreover, burnout significantly diminishes providers' performance, leading to increased
medical errors, posing a considerable risk to patients' safety,[3]
[4] and retention and turnover in an already shrinking supply of primary care providers.[5] The mitigation of providers' burnout holds significant importance within the context
of medical care.
Among the causes of provider burnout, a lack of workload control stands out as a major
contributing factor. In primary care, the measurement of providers' workload predominantly
centers on two key factors: the quantity of appointments within their patient panel
and the time devoted to non–visit care activities. Research indicates that patients
with higher levels of medical complexity are more likely to require additional non–visit
care interactions.[6]
[7] Therefore, to appropriately assign a new patient to a primary care physician (PCP)
panel, a patient's health conditions and complexity are important factors to understand.
However, for patients who are established with primary care services for the first
time, health and risk factor information may not be available when the appointment
request is made. Therefore, estimating a new patient's health complexity is imperative,
as it facilitates the prediction of their potential appointments and non–visit care
interactions.
An extensively used metric to evaluate health status is the Adjusted Clinical Group
(ACG) score, serving as a reflection of patients' projected or actual utilization
of health care services and it has demonstrated a strong correlation with the physician
contact rate.[8]
[9] ACG scores are typically computed based on patients' age, gender, and the complete
set of medical diagnoses documented during a specified time frame, typically spanning
1 year. These scores have also been used as evidence for care management and intervention,
and resource assignment.[10]
[11]
[12] However, when medical diagnosis information is not available, especially for new
patients, our institution relies solely on age and gender as predictors, resulting
in a reduction in accuracy.
To address this limitation, we propose leveraging social determinants of health (SDOH)
information as additional predictors to enhance risk prediction for new patients.
SDOH, which encompasses nonmedical factors that influence health outcomes, demonstrates
strong correlations with patients' health status and health care expenditures.[13] It is increasingly acknowledged as a primary factor influencing clinical health
results, and is useful in patient care transition planning.[8]
[9] Existing research has consistently demonstrated the significant relationship between
SDOH and patients' health conditions. For instance, studies have shown that incorporating
SDOH information can greatly enhance the accuracy of predicting patients' health care
utilization and health outcomes.[10] Moreover, there are benefits of including SDOH as predictors in machine learning
models within the health care domain. This is evident in areas such as cardiovascular
disease prediction,[11] sepsis readmission prediction,[12] and missed breast imaging appointment prediction,[14] where the integration of SDOH has proven to improve model performance. Additionally,
obtaining SDOH information does not require resources from the medical team, but can
be obtained directly from patient surveys or via digital conversation.[15] The extensive body of existing research strongly supports the validity of utilizing
SDOH as essential predictors for patient risk and medical needs that indicate provider
workload.
Achieving a balanced workload distribution among providers is crucial for optimizing
patient assignments and enhancing overall quality of care. While concerns for patient
satisfaction and experience may deter clinics from reassigning patients to different
panels, proper assignment of new patients to provider panels holds the potential to
achieve workload equilibrium. This study offers insights into optimizing patient assignments
to mitigate provider burnout and cultivate a sustainable professional landscape for
health care providers, ultimately addressing the pressing issue of burnout within
the medical care system.
Objectives
In this study, we examined data from a family medicine department in a large teaching
hospital setting, which is currently using only age and gender to estimate the ACG
patient risk score for new patients and collecting the SDOH at the time of patient
visit. The SDOH collection timing can be adjusted to the time of requesting an appointment.
Thus, the objective of this study was twofold. We first examine the impact of integrating
SDOH data with age and gender for predicting ACG scores in new patients. Second, we
investigate the potential improvement in model accuracy by incorporating the predicted
ACG score as a predictor for the number of appointments and non–visit care interactions.
Methods
The study focused on patients who are new to family medicine services and have not
yet been seen for their initial appointment. Information on age, gender, and SDOH
was collected for each patient. When predicting the number of appointments and non–visit
care interactions, the pivotal elements in the workload of medical providers, assessing
patient risk emerges, as a crucial consideration. To predict their ACG scores, five
machine learning algorithms, namely random forest,[16] gradient boosting,[17] logistic regression,[18] support vector machine (SVM),[19] and decision tree[20] were utilized and compared for predictability. These algorithms were chosen due
to their effectiveness in handling both categorical and continuous features, making
them suitable for ACG score prediction. We then evaluated the models based on appropriate
metrics such as accuracy and AUC values to select the best performing algorithm.
Subsequently, the predicted ACG scores were incorporated with the previously collected
demographic data (i.e., age, gender, and SDOH) to predict the number of appointments
and non–visit care interactions that each patient would require over the next year
via the linear regression model, which is known for its strong interpretability.[21] The integration of ACG scores in this step enables a more comprehensive and accurate
estimation of patients' expected health care utilization, considering both their health
status and predicted service needs. [Fig. 1] demonstrates the framework of this method.
Fig. 1 Framework of providers' workload prediction. ACG, Adjusted Clinical Group; SVM, support
vector machine.
Data Description
The dataset was composed of patients' initial visit requests within a 1-year timeframe
and the calculated ACG risk score, number of appointments and non–visit care interactions
for the subsequent year as the outcomes for the modeling. The data spanned from 2018
to 2019 for a total visits of 33,262. The study encompasses a total of 56 variables,
consisting of demographic variables—age and gender, and the remaining variables are
related to SDOH. More specifically. The SDOH variables from the participating family
medicine department primarily pertain to education and social activities. Education
activities include only education level in our dataset. Social activities consist
of 14 categories ([Table 1]).
Table 1
Social activity categories
Category
|
Variable
|
Description
|
Birth control methods
|
DIAPHRAGM_YN
|
Indicates if the patient uses a diaphragm as a form of birth control
|
IMPLANT_YN
|
Indicates if the patient uses an implant as a form of birth control
|
INJECTION_YN
|
Indicates if the patient uses an injection as a form of birth control
|
INSERTS_YN
|
Indicates if the patient uses inserts as a form of birth control
|
IUD_YN
|
Indicates if the patient uses an intrauterine device (IUD) as birth control
|
IV_DRUG_USER_YN
|
Indicates if the patient uses intravenous drugs
|
PILL_YN
|
Indicates if the patient uses birth control pills
|
RHYTHM_YN
|
Indicates if the patient uses the rhythm method as a form of birth control
|
SPERMICIDE_YN
|
Indicates if the patient uses spermicide
|
SPONGE_YN
|
Indicates if the patient uses a sponge as a form of birth control
|
SURGICAL_YN
|
Indicates if the patient has undergone surgical birth control (e.g., hysterectomy)
|
Alcohol consumption
|
alcohol_binge
|
Indicates if the patient engages in binge drinking
|
alcohol_drinks_per_day
|
Number of standard drinks consumed in a typical day
|
alcohol_freq
|
Frequency of drinking alcohol
|
ALCOHOL_OZ_PER_WK
|
Fluid ounces of alcohol consumed per week
|
alcohol_use
|
Indicates if the patient uses alcohol
|
Cigar and cigarettes
|
CIGARETTES_YN
|
Indicates if the patient uses cigarettes
|
CIGARS_YN
|
Indicates if the patient uses cigars
|
Tobacco usage
|
CHEW_YN
|
Indicates if the patient chews tobacco
|
PIPES_YN
|
Indicates if the patient smokes a pipe
|
smokeless_tob_use
|
Indicates if the patient uses smokeless tobacco
|
smoking_tob_use
|
Indicates if the patient smokes tobacco
|
SNUFF_YN
|
Indicates if the patient uses snuff
|
TOBACCO_CURR_PACK_PER_DAY
|
Number of packs of cigarettes smoked per day
|
TOBACCO_PACK_YEAR
|
Number of pack-years of smoking tobacco
|
tobacco_user
|
Indicates if the patient uses tobacco
|
Drug use
|
ill_drug_user
|
Indicates if the patient uses illicit drugs
|
ILLICIT_DRUG_FREQ_PER_WEEK
|
Frequency of illicit drug use per week
|
Emotion related
|
daily_stress
|
Level of daily stress experienced by the patient
|
Partner related
|
ipv_emotional_abuse
|
Indicates emotional abuse from an intimate partner
|
ipv_fear
|
Indicates fear of an intimate partner
|
ipv_physical_abuse
|
Indicates physical abuse from an intimate partner
|
ipv_sexual_abuse
|
Indicates sexual abuse from an intimate partner
|
living_w_spouse
|
Indicates if the patient is currently living with a spouse or partner
|
Sexual activities
|
ABSTINENCE_YN
|
Indicates if the patient practices abstinence
|
CONDOM_YN
|
Indicates if the patient uses a condom during sexual activity
|
sexually_active
|
Indicates the patient's sexual activity status
|
Social status
|
FEMALE_PARTNER_YN
|
Indicates if the patient has a female sexual partner
|
MALE_PARTNER_YN
|
Indicates if the patient has a male sexual partner
|
phone_communication
|
Frequency of socializing with friends or family over the phone
|
socialization_freq
|
Frequency of socializing with friends or family in person
|
Physical activities
|
phys_act_days_per_week
|
Number of days per week the patient exercises
|
phys_act_min_per_sess
|
Number of minutes per exercise session
|
Group activities
|
club_member
|
Indicates if the patient is a member of any clubs or organizations
|
clubmtg_attendance
|
Frequency of attending club or other organization meetings in a year
|
Food related
|
food_insecurity_scarce
|
Indicates if the patient has run out of food and was not able to buy more
|
food_insecurity_worry
|
Indicates if the patient worried about food running out in the past year
|
Access to health care
|
med_transport_needs
|
Indicates if the patient had difficulty with transportation for medical appointments
|
other_transport_needs
|
Indicates if the patient had difficulty with transportation for things other than
medical appointments
|
Others
|
church_attendance
|
Indicates how often the patient attends religious services
|
Sex_At_Birth
|
Indicates the sex assigned at birth of the patient
|
UNKNOWN_FAM_HX_YN
|
Indicates if the patient's family history is unknown to the patient
|
To facilitate the analysis, categorical variables were transformed into dummy variables,
resulting in a total of 182 features ([Table 2]). Dummy variables represent categorical data in a numerical format, with each category
of the original variable assigned a distinct binary value, typically 0 or 1. For instance,
the categorical variable “SURGICAL_YN” in our dataset, denoting the use of a surgical
method of birth control, was transformed into two dummy variables: “Y” and “N.” These
dummy variables assume a value of 1 if the observation belongs to the corresponding
category and 0 otherwise. This conversion facilitates the inclusion of categorical
variables in regression models and other analytical techniques that necessitate numerical
inputs. The distributions of the number of appointments and non–visit care interactions
are shown in [Fig. 2]. The ACG scores were categorized into five groups, each corresponding to a different
level of risk ([Table 3]).[22] Higher ACG scores indicate a higher level of risk. The majority of patients were
categorized within the 0 to 1 range. This was expected as the majority of individuals
seen in primary care settings do not exhibit severe illnesses.
Table 2
Feature statistics
Variable
|
Summary statistics, n (%) or median (Q1–Q3)
|
ACG 1
|
ACG 2
|
ACG 3
|
ACG 4
|
ACG 5
|
ABSTINENCE_YN
|
N
|
28,884 (99.61%)
|
2,507 (99.05%)
|
1,044 (98.96%)
|
161 (100.00%)
|
516 (99.61%)
|
Y
|
113 (0.39%)
|
24 (0.95%)
|
11 (1.04%)
|
0 (0.00%)
|
2 (0.39%)
|
Age
|
20.85 (9.35–34.08)
|
34.15 (20.99–43.99)
|
32.46 (27.34 ,40.77)
|
33.06 (25.31–48.56)
|
30.42 (18.92–39.36)
|
alcohol_binge
|
Daily or almost daily
|
1 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Less than monthly
|
157 (0.54%)
|
15 (0.59%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Monthly
|
31 (0.11%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Never
|
310 (1.07%)
|
22 (0.87%)
|
8 (0.76%)
|
0 (0.00%)
|
0 (0.00%)
|
Not asked
|
28,492 (98.26%)
|
2,492 (98.46%)
|
1,046 (99.15%)
|
161 (100.00%)
|
518 (100.00%)
|
Patient refused
|
4 (0.01%)
|
2 (0.08%)
|
1 (0.09%)
|
0 (0.00%)
|
0 (0.00%)
|
Weekly
|
2 (0.01%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
alcohol_drinks_per_day
|
1 or 2
|
373 (1.29%)
|
23 (0.91%)
|
4 (0.38%)
|
0 (0.00%)
|
0 (0.00%)
|
3 or 4
|
88 (0.30%)
|
5 (0.20%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
5 or 6
|
5 (0.02%)
|
2 (0.08%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
7 to 9
|
2 (0.01%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Not asked
|
28,490 (98.25%)
|
2,495 (98.58%)
|
1,049 (99.43%)
|
161 (100.00%)
|
518 (100.00%)
|
Patient refused
|
39 (0.13%)
|
6 (0.24%)
|
2 (0.19%)
|
0 (0.00%)
|
0 (0.00%)
|
alcohol_freq
|
2–3 times a week
|
128 (0.44%)
|
8 (0.32%)
|
1 (0.09%)
|
0 (0.00%)
|
0 (0.00%)
|
2–4 times a month
|
197 (0.68%)
|
12 (0.47%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
4 or more times a week
|
30 (0.10%)
|
3 (0.12%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Monthly or less
|
135 (0.47%)
|
7 (0.28%)
|
3 (0.28%)
|
0 (0.00%)
|
0 (0.00%)
|
Never
|
134 (0.46%)
|
17 (0.67%)
|
10 (0.95%)
|
1 (0.62%)
|
0 (0.00%)
|
Not asked
|
28,373 (97.85%)
|
2,484 (98.14%)
|
1041 (98.67%)
|
160 (99.38%)
|
518 (100.00%)
|
ALCOHOL_OZ_PER_WK
|
0.00 (0.00–0.00)
|
0.00 (0.00–0.00)
|
0.00 (0.00–0.00)
|
0.00 (0.00–0.00)
|
0.00 (0.00–0.00)
|
alcohol_use
|
Defer
|
34 (0.12%)
|
3 (0.12%)
|
5 (0.47%)
|
1 (0.62%)
|
1 (0.19%)
|
Never
|
108 (0.37%)
|
6 (0.24%)
|
5 (0.47%)
|
1 (0.62%)
|
0 (0.00%)
|
No
|
3,211 (11.07%)
|
508 (20.07%)
|
363 (34.41%)
|
52 (32.30%)
|
176 (33.98%)
|
Not asked
|
22,853 (78.81%)
|
1,525 (60.25%)
|
407 (38.58%)
|
53 (32.92%)
|
233 (44.98%)
|
Not currently
|
44 (0.15%)
|
5 (0.20%)
|
7 (0.66%)
|
1 (0.62%)
|
0 (0.00%)
|
Yes
|
2747 (9.47%)
|
484 (19.12%)
|
268 (25.40%)
|
53 (32.92%)
|
108 (20.85%)
|
CHEW_YN
|
N
|
28,836 (99.44%)
|
2,490 (98.38%)
|
1,046 (99.15%)
|
156 (96.89%)
|
512 (98.84%)
|
Y
|
161 (0.56%)
|
41 (1.62%)
|
9 (0.85%)
|
5 (3.11%)
|
6 (1.16%)
|
church_attendance
|
1 to 4 times per year
|
89 (0.31%)
|
5 (0.20%)
|
3 (0.28%)
|
0 (0.00%)
|
0 (0.00%)
|
More than 4 times per year
|
185 (0.64%)
|
15 (0.59%)
|
6 (0.57%)
|
0 (0.00%)
|
0 (0.00%)
|
Never
|
232 (0.80%)
|
18 (0.71%)
|
3 (0.28%)
|
0 (0.00%)
|
0 (0.00%)
|
Not asked
|
28,464 (98.16%)
|
2,487 (98.26%)
|
1,042 (98.77%)
|
161 (100.00%)
|
518 (100.00%)
|
Patient refused
|
27 (0.09%)
|
6 (0.24%)
|
1 (0.09%)
|
0 (0.00%)
|
0 (0.00%)
|
CIGARETTES_YN
|
N
|
28,338 (97.73%)
|
2,358 (93.16%)
|
976 (92.51%)
|
133 (82.61%)
|
472 (91.12%)
|
Y
|
659 (2.27%)
|
173 (6.84%)
|
79 (7.49%)
|
28 (17.39%)
|
46 (8.88%)
|
CIGARS_YN
|
N
|
28,942 (99.81%)
|
2,512 (99.25%)
|
1,051 (99.62%)
|
159 (98.76%)
|
514 (99.23%)
|
Y
|
55 (0.19%)
|
19 (0.75%)
|
4 (0.38%)
|
2 (1.24%)
|
4 (0.77%)
|
club_member
|
No
|
328 (1.13%)
|
24 (0.95%)
|
8 (0.76%)
|
0 (0.00%)
|
0 (0.00%)
|
Not asked
|
28,464 (98.16%)
|
2,487 (98.26%)
|
1,042 (98.77%)
|
161 (100.00%)
|
518 (100.00%)
|
Patient refused
|
17 (0.06%)
|
1 (0.04%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Yes
|
188 (0.65%)
|
19 (0.75%)
|
5 (0.47%)
|
0 (0.00%)
|
0 (0.00%)
|
clubmtg_attendance
|
1 to 4 times per year
|
48 (0.17%)
|
4 (0.16%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
More than 4 times per year
|
144 (0.50%)
|
14 (0.55%)
|
5 (0.47%)
|
0 (0.00%)
|
0 (0.00%)
|
Never
|
211 (0.73%)
|
19 (0.75%)
|
6 (0.57%)
|
0 (0.00%)
|
0 (0.00%)
|
Not asked
|
28,518 (98.35%)
|
2,490 (98.38%)
|
1,043 (98.86%)
|
161 (100.00%)
|
518 (100.00%)
|
Patient refused
|
76 (0.26%)
|
4 (0.16%)
|
1 (0.09%)
|
0 (0.00%)
|
0 (0.00%)
|
CONDOM_YN
|
N
|
28,199 (97.25%)
|
2,409 (95.18%)
|
1,001 (94.88%)
|
145 (90.06%)
|
492 (94.98%)
|
Y
|
798 (2.75%)
|
122 (4.82%)
|
54 (5.12%)
|
16 (9.94%)
|
26 (5.02%)
|
daily_stress
|
Not Asked
|
28,478 (98.21%)
|
2,493 (98.50%)
|
1,043 (98.86%)
|
161 (100.00%)
|
518 (100.00%)
|
Not at all
|
161 (0.56%)
|
10 (0.40%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Only a little
|
178 (0.61%)
|
9 (0.36%)
|
7 (0.66%)
|
0 (0.00%)
|
0 (0.00%)
|
Rather much
|
41 (0.14%)
|
6 (0.24%)
|
1 (0.09%)
|
0 (0.00%)
|
0 (0.00%)
|
To some extent
|
113 (0.39%)
|
9 (0.36%)
|
4 (0.38%)
|
0 (0.00%)
|
0 (0.00%)
|
Very much
|
26 (0.09%)
|
4 (0.16%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
DIAPHRAGM_YN
|
N
|
28,992 (99.98%)
|
2,531 (100.00%)
|
1,054 (99.91%)
|
161 (100.00%)
|
518 (100.00%)
|
Y
|
5 (0.02%)
|
0 (0.00%)
|
1 (0.09%)
|
0 (0.00%)
|
0 (0.00%)
|
education_level
|
11th grade
|
1 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
12th grade
|
24 (0.08%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
2nd grade
|
0 (0.00%)
|
0 (0.00%)
|
1 (0.09%)
|
0 (0.00%)
|
0 (0.00%)
|
4th grade
|
1 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
7th grade
|
1 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
8th grade
|
1 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
9th grade
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Associate degree: academic program
|
22 (0.08%)
|
1 (0.04%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Associate degree: occupational, technical, or vocational program
|
17 (0.06%)
|
5 (0.20%)
|
2 (0.19%)
|
0 (0.00%)
|
0 (0.00%)
|
Bachelor's degree (e.g., BA, AB, BS)
|
221 (0.76%)
|
16 (0.63%)
|
2 (0.19%)
|
0 (0.00%)
|
0 (0.00%)
|
Doctorate
|
38 (0.13%)
|
3 (0.12%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
GED or equivalent
|
14 (0.05%)
|
1 (0.04%)
|
2 (0.19%)
|
0 (0.00%)
|
0 (0.00%)
|
High school graduate
|
1 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Master's degree (e.g., MA, MS, MEng, MEd, MSW, MBA)
|
74 (0.26%)
|
6 (0.24%)
|
5 (0.47%)
|
0 (0.00%)
|
0 (0.00%)
|
Not asked
|
28,465 (98.17%)
|
2,488 (98.30%)
|
1,043 (98.86%)
|
161 (100.00%)
|
518 (100.00%)
|
Professional school degree (e.g., MD, DDS, DVM, JD)
|
92 (0.32%)
|
7 (0.28%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Some college, no degree
|
25 (0.09%)
|
4 (0.16%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
FEMALE_PARTNER_YN
|
N
|
27,933 (96.33%)
|
2,343 (92.57%)
|
1,008 (95.55%)
|
138 (85.71%)
|
491 (94.79%)
|
Y
|
1,064 (3.67%)
|
188 (7.43%)
|
47 (4.45%)
|
23 (14.29%)
|
27 (5.21%)
|
food_insecurity_scarce
|
Never true
|
521 (1.80%)
|
36 (1.42%)
|
12 (1.14%)
|
0 (0.00%)
|
0 (0.00%)
|
Not asked
|
28,459 (98.14%)
|
2,491 (98.42%)
|
1,042 (98.77%)
|
161 (100.00%)
|
518 (100.00%)
|
Often true
|
2 (0.01%)
|
1 (0.04%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Patient refused
|
2 (0.01%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Sometimes true
|
13 (0.04%)
|
3 (0.12%)
|
1 (0.09%)
|
0 (0.00%)
|
0 (0.00%)
|
food_insecurity_worry
|
Never true
|
512 (1.77%)
|
37 (1.46%)
|
12 (1.14%)
|
0 (0.00%)
|
0 (0.00%)
|
Not asked
|
28,456 (98.13%)
|
2,490 (98.38%)
|
1,042 (98.77%)
|
161 (100.00%)
|
518 (100.00%)
|
Often true
|
2 (0.01%)
|
1 (0.04%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Patient refused
|
3 (0.01%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Sometimes true
|
24 (0.08%)
|
3 (0.12%)
|
1 (0.09%)
|
0 (0.00%)
|
0 (0.00%)
|
Gender
|
Female
|
15,455 (53.30%)
|
1528 (60.37%)
|
885 (83.89%)
|
75 (46.58%)
|
371 (71.62%)
|
Male
|
13,540 (46.69%)
|
1,003 (39.63%)
|
170 (16.11%)
|
86 (53.42%)
|
147 (28.38%)
|
Not recorded
|
1 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Uncertain
|
1 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
ill_drug_user
|
Defer
|
38 (0.13%)
|
3 (0.12%)
|
3 (0.28%)
|
2 (1.24%)
|
2 (0.39%)
|
Never
|
375 (1.29%)
|
19 (0.75%)
|
12 (1.14%)
|
1 (0.62%)
|
1 (0.19%)
|
No
|
5461 (18.83%)
|
915 (36.15%)
|
612 (58.01%)
|
89 (55.28%)
|
261 (50.39%)
|
Not asked
|
22,906 (78.99%)
|
1,538 (60.77%)
|
405 (38.39%)
|
57 (35.40%)
|
236 (45.56%)
|
Not currently
|
38 (0.13%)
|
5 (0.20%)
|
1 (0.09%)
|
1 (0.62%)
|
1 (0.19%)
|
Yes
|
179 (0.62%)
|
51 (2.02%)
|
22 (2.09%)
|
11 (6.83%)
|
17 (3.28%)
|
ILLICIT_DRUG_FREQ_PER_WEEK
|
275.2 (0.95%)
|
63 (2.49%)
|
24 (2.27%)
|
28 (17.39%)
|
14 (2.70%)
|
IMPLANT_YN
|
N
|
28,884 (99.61%)
|
2,511 (99.21%)
|
1,042 (98.77%)
|
159 (98.76%)
|
514 (99.23%)
|
Y
|
113 (0.39%)
|
20 (0.79%)
|
13 (1.23%)
|
2 (1.24%)
|
4 (0.77%)
|
INJECTION_YN
|
N
|
28,974 (99.92%)
|
2,527 (99.84%)
|
1,053 (99.81%)
|
159 (98.76%)
|
513 (99.03%)
|
Y
|
23 (0.08%)
|
4 (0.16%)
|
2 (0.19%)
|
2 (1.24%)
|
5 (0.97%)
|
INSERTS_YN
|
N
|
28,997 (100.00%)
|
2,531 (100.00%)
|
1,055 (100.00%)
|
161 (100.00%)
|
518 (100.00%)
|
Y
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
ipv_emotional_abuse
|
No
|
504 (1.74%)
|
38 (1.50%)
|
11 (1.04%)
|
0 (0.00%)
|
0 (0.00%)
|
Not asked
|
28,482 (98.22%)
|
2,493 (98.50%)
|
1,042 (98.77%)
|
161 (100.00%)
|
518 (100.00%)
|
Patient refused
|
2 (0.01%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Yes
|
9 (0.03%)
|
0 (0.00%)
|
2 (0.19%)
|
0 (0.00%)
|
0 (0.00%)
|
ipv_fear
|
No
|
507 (1.75%)
|
38 (1.50%)
|
12 (1.14%)
|
0 (0.00%)
|
0 (0.00%)
|
Not asked
|
28,484 (98.23%)
|
2,493 (98.50%)
|
1,042 (98.77%)
|
161 (100.00%)
|
518 (100.00%)
|
Patient refused
|
1 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Yes
|
5 (0.02%)
|
0 (0.00%)
|
1 (0.09%)
|
0 (0.00%)
|
0 (0.00%)
|
ipv_physical_abuse
|
No
|
512 (1.77%)
|
38 (1.50%)
|
12 (1.14%)
|
0 (0.00%)
|
0 (0.00%)
|
Not asked
|
28,483 (98.23%)
|
2,493 (98.50%)
|
1,042 (98.77%)
|
161 (100.00%)
|
518 (100.00%)
|
Patient refused
|
1 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Yes
|
1 (0.00%)
|
0 (0.00%)
|
1 (0.09%)
|
0 (0.00%)
|
0 (0.00%)
|
ipv_sexual_abuse
|
No
|
476 (1.64%)
|
34 (1.34%)
|
10 (0.95%)
|
0 (0.00%)
|
0 (0.00%)
|
Not asked
|
28,520 (98.36%)
|
2,496 (98.62%)
|
1,044 (98.96%)
|
161 (100.00%)
|
518 (100.00%)
|
Patient refused
|
1 (0.00%)
|
1 (0.04%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Yes
|
0 (0.00%)
|
0 (0.00%)
|
1 (0.09%)
|
0 (0.00%)
|
0 (0.00%)
|
IUD_YN
|
N
|
28,522 (98.36%)
|
2,438 (96.33%)
|
1,013 (96.02%)
|
149 (92.55%)
|
509 (98.26%)
|
Y
|
475 (1.64%)
|
93 (3.67%)
|
42 (3.98%)
|
12 (7.45%)
|
9 (1.74%)
|
IV_DRUG_USER_YN
|
N
|
28,995 (99.99%)
|
2,531 (100.00%)
|
1,055 (100.00%)
|
161 (100.00%)
|
518 (100.00%)
|
Y
|
2 (0.01%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
living_w_spouse
|
Divorced
|
10 (0.03%)
|
1 (0.04%)
|
1 (0.09%)
|
0 (0.00%)
|
0 (0.00%)
|
Living with partner
|
69 (0.24%)
|
2 (0.08%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Married
|
225 (0.78%)
|
22 (0.87%)
|
9 (0.85%)
|
0 (0.00%)
|
0 (0.00%)
|
Never married
|
176 (0.61%)
|
13 (0.51%)
|
1 (0.09%)
|
0 (0.00%)
|
0 (0.00%)
|
Not asked
|
28,498 (98.28%)
|
2,491 (98.42%)
|
1,044 (98.96%)
|
161 (100.00%)
|
518 (100.00%)
|
Patient refused
|
14 (0.05%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Separated
|
3 (0.01%)
|
1 (0.04%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Widowed
|
2 (0.01%)
|
1 (0.04%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
MALE_PARTNER_YN
|
N
|
26,827 (92.52%)
|
2,109 (83.33%)
|
630 (59.72%)
|
125 (77.64%)
|
351 (67.76%)
|
Y
|
2,170 (7.48%)
|
422 (16.67%)
|
425 (40.28%)
|
36 (22.36%)
|
167 (32.24%)
|
med_transport_needs
|
No
|
533 (1.84%)
|
40 (1.58%)
|
11 (1.04%)
|
0 (0.00%)
|
0 (0.00%)
|
Not asked
|
28,456 (98.13%)
|
2,489 (98.34%)
|
1,042 (98.77%)
|
161 (100.00%)
|
518 (100.00%)
|
Patient refused
|
2 (0.01%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Yes
|
6 (0.02%)
|
2 (0.08%)
|
2 (0.19%)
|
0 (0.00%)
|
0 (0.00%)
|
other_transport_needs
|
No
|
519 (1.79%)
|
39 (1.54%)
|
11 (1.04%)
|
0 (0.00%)
|
0 (0.00%)
|
Not asked
|
28,475 (98.20%)
|
2,490 (98.38%)
|
1,043 (98.86%)
|
161 (100.00%)
|
518 (100.00%)
|
Patient refused
|
1 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Yes
|
2 (0.01%)
|
2 (0.08%)
|
1 (0.09%)
|
0 (0.00%)
|
0 (0.00%)
|
phone_communication
|
More than three times a week
|
254 (0.88%)
|
17 (0.67%)
|
6 (0.57%)
|
0 (0.00%)
|
0 (0.00%)
|
Never
|
14 (0.05%)
|
1 (0.04%)
|
1 (0.09%)
|
0 (0.00%)
|
0 (0.00%)
|
Not asked
|
28,492 (98.26%)
|
2,489 (98.34%)
|
1,045 (99.05%)
|
161 (100.00%)
|
518 (100.00%)
|
Once a week
|
95 (0.33%)
|
7 (0.28%)
|
2 (0.19%)
|
0 (0.00%)
|
0 (0.00%)
|
Patient refused
|
9 (0.03%)
|
1 (0.04%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Three times a week
|
47 (0.16%)
|
11 (0.43%)
|
1 (0.09%)
|
0 (0.00%)
|
0 (0.00%)
|
Twice a week
|
86 (0.30%)
|
5 (0.20%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
phys_act_days_per_week
|
0 days
|
52 (0.18%)
|
9 (0.36%)
|
5 (0.47%)
|
0 (0.00%)
|
0 (0.00%)
|
1 day
|
53 (0.18%)
|
3 (0.12%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
2 days
|
100 (0.34%)
|
6 (0.24%)
|
2 (0.19%)
|
0 (0.00%)
|
0 (0.00%)
|
3 days
|
125 (0.43%)
|
10 (0.40%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
4 days
|
94 (0.32%)
|
6 (0.24%)
|
4 (0.38%)
|
0 (0.00%)
|
0 (0.00%)
|
5 days
|
69 (0.24%)
|
5 (0.20%)
|
2 (0.19%)
|
0 (0.00%)
|
0 (0.00%)
|
6 days
|
39 (0.13%)
|
3 (0.12%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
7 days
|
27 (0.09%)
|
1 (0.04%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Not asked
|
28,437 (98.07%)
|
2,488 (98.30%)
|
1,042 (98.77%)
|
161 (100.00%)
|
518 (100.00%)
|
Patient refused
|
1 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
phys_act_min_per_sess
|
0 minute
|
44 (0.15%)
|
7 (0.28%)
|
3 (0.28%)
|
0 (0.00%)
|
0 (0.00%)
|
10 minutes
|
19 (0.07%)
|
1 (0.04%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
100 minutes
|
2 (0.01%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
120 minutes
|
7 (0.02%)
|
2 (0.08%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
130 minutes
|
1 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
150+ minutes
|
4 (0.01%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
20 minutes
|
61 (0.21%)
|
1 (0.04%)
|
2 (0.19%)
|
0 (0.00%)
|
0 (0.00%)
|
30 minutes
|
155 (0.53%)
|
9 (0.36%)
|
3 (0.28%)
|
0 (0.00%)
|
0 (0.00%)
|
40 minutes
|
91 (0.31%)
|
7 (0.28%)
|
3 (0.28%)
|
0 (0.00%)
|
0 (0.00%)
|
50 minutes
|
36 (0.12%)
|
4 (0.16%)
|
1 (0.09%)
|
0 (0.00%)
|
0 (0.00%)
|
60 minutes
|
99 (0.34%)
|
7 (0.28%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
70 minutes
|
6 (0.02%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
80 minutes
|
6 (0.02%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
90 minutes
|
10 (0.03%)
|
3 (0.12%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Not asked
|
28,451 (98.12%)
|
2,490 (98.38%)
|
1,043 (98.86%)
|
161 (100.00%)
|
518 (100.00%)
|
Patient refused
|
5 (0.02%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
PILL_YN
|
N
|
28,355 (97.79%)
|
2,428 (95.93%)
|
1,009 (95.64%)
|
153 (95.03%)
|
503 (97.10%)
|
Y
|
642 (2.21%)
|
103 (4.07%)
|
46 (4.36%)
|
8 (4.97%)
|
15 (2.90%)
|
PIPES_YN
|
N
|
28,985 (99.96%)
|
2,530 (99.96%)
|
1,055 (100.00%)
|
159 (98.76%)
|
516 (99.61%)
|
Y
|
12 (0.04%)
|
1 (0.04%)
|
0 (0.00%)
|
2 (1.24%)
|
2 (0.39%)
|
RHYTHM_YN
|
RHYTHM_YN_N
|
28,973 (99.92%)
|
2,525 (99.76%)
|
1,054 (99.91%)
|
161 (100.00%)
|
516 (99.61%)
|
RHYTHM_YN_Y
|
24 (0.08%)
|
6 (0.24%)
|
1 (0.09%)
|
0 (0.00%)
|
2 (0.39%)
|
Sex_At_Birth
|
CHOOSE NOT TO DISCLOSE
|
44 (0.15%)
|
8 (0.32%)
|
6 (0.57%)
|
2 (1.24%)
|
1 (0.19%)
|
FEMALE
|
7,345 (25.33%)
|
1,153 (45.56%)
|
792 (75.07%)
|
68 (42.24%)
|
297 (57.34%)
|
MALE
|
4,794 (16.53%)
|
618 (24.42%)
|
123 (11.66%)
|
66 (40.99%)
|
82 (15.83%)
|
Not asked
|
16,804 (57.95%)
|
751 (29.67%)
|
133 (12.61%)
|
25 (15.53%)
|
138 (26.64%)
|
NOT RECORDED ON BIRTH CERTIFICATE
|
1 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
UNCERTAIN
|
5 (0.02%)
|
1 (0.04%)
|
1 (0.09%)
|
0 (0.00%)
|
0 (0.00%)
|
UNKNOWN
|
4 (0.01%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
sexually_active
|
Defer
|
388 (1.34%)
|
80 (3.16%)
|
35 (3.32%)
|
14 (8.70%)
|
29 (5.60%)
|
Never
|
1,572 (5.42%)
|
140 (5.53%)
|
43 (4.08%)
|
15 (9.32%)
|
26 (5.02%)
|
Not asked
|
23,637 (81.52%)
|
1,673 (66.10%)
|
485 (45.97%)
|
75 (46.58%)
|
266 (51.35%)
|
Not currently
|
510 (1.76%)
|
112 (4.43%)
|
42 (3.98%)
|
17 (10.56%)
|
22 (4.25%)
|
Yes
|
2,890 (9.97%)
|
526 (20.78%)
|
450 (42.65%)
|
40 (24.84%)
|
175 (33.78%)
|
smokeless_tob_use
|
Current
|
112 (0.39%)
|
31 (1.22%)
|
5 (0.47%)
|
5 (3.11%)
|
3 (0.58%)
|
Former
|
192 (0.66%)
|
42 (1.66%)
|
13 (1.23%)
|
6 (3.73%)
|
8 (1.54%)
|
Never
|
10,373 (35.77%)
|
1,478 (58.40%)
|
813 (77.06%)
|
118 (73.29%)
|
336 (64.86%)
|
Unknown
|
18,320 (63.18%)
|
980 (38.72%)
|
224 (21.23%)
|
32 (19.88%)
|
171 (33.01%)
|
smoking_tob_use
|
Every day
|
700 (2.41%)
|
137 (5.41%)
|
55 (5.21%)
|
25 (15.53%)
|
31 (5.98%)
|
Former
|
1,441 (4.97%)
|
283 (11.18%)
|
127 (12.04%)
|
31 (19.25%)
|
65 (12.55%)
|
Heavy smoker
|
14 (0.05%)
|
2 (0.08%)
|
2 (0.19%)
|
0 (0.00%)
|
4 (0.77%)
|
Light smoker
|
88 (0.30%)
|
21 (0.83%)
|
5 (0.47%)
|
3 (1.86%)
|
2 (0.39%)
|
Never
|
19,751 (68.11%)
|
1,668 (65.90%)
|
827 (78.39%)
|
92 (57.14%)
|
290 (55.98%)
|
Never assessed
|
1,536 (5.30%)
|
258 (10.19%)
|
19 (1.80%)
|
2 (1.24%)
|
106 (20.46%)
|
Passive smoke exposure—never smoker
|
103 (0.36%)
|
9 (0.36%)
|
1 (0.09%)
|
0 (0.00%)
|
1 (0.19%)
|
Smoker, current status unknown
|
14 (0.05%)
|
0 (0.00%)
|
1 (0.09%)
|
0 (0.00%)
|
1 (0.19%)
|
Some days
|
300 (1.03%)
|
31 (1.22%)
|
7 (0.66%)
|
7 (4.35%)
|
6 (1.16%)
|
Unknown
|
5,050 (17.42%)
|
122 (4.82%)
|
11 (1.04%)
|
1 (0.62%)
|
12 (2.32%)
|
SNUFF_YN
|
N
|
28,974 (99.92%)
|
2,526 (99.80%)
|
1,050 (99.53%)
|
161 (100.00%)
|
516 (99.61%)
|
Y
|
23 (0.08%)
|
5 (0.20%)
|
5 (0.47%)
|
0 (0.00%)
|
2 (0.39%)
|
socialization_freq
|
More than three times a week
|
97 (0.33%)
|
7 (0.28%)
|
2 (0.19%)
|
0 (0.00%)
|
0 (0.00%)
|
Never
|
27 (0.09%)
|
5 (0.20%)
|
3 (0.28%)
|
0 (0.00%)
|
0 (0.00%)
|
Not asked
|
28,464 (98.16%)
|
2,487 (98.26%)
|
1,042 (98.77%)
|
161 (100.00%)
|
518 (100.00%)
|
Once a week
|
229 (0.79%)
|
13 (0.51%)
|
7 (0.66%)
|
0 (0.00%)
|
0 (0.00%)
|
Patient refused
|
26 (0.09%)
|
5 (0.20%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
Three times a week
|
47 (0.16%)
|
2 (0.08%)
|
1 (0.09%)
|
0 (0.00%)
|
0 (0.00%)
|
Twice a week
|
107 (0.37%)
|
12 (0.47%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
SPERMICIDE_YN
|
N
|
28,989 (99.97%)
|
2,531 (100.00%)
|
1,055 (100.00%)
|
161 (100.00%)
|
518 (100.00%)
|
Y
|
8 (0.03%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
SPONGE_YN
|
N
|
28,996 (100.00%)
|
2,531 (100.00%)
|
1,055 (100.00%)
|
161 (100.00%)
|
518 (100.00%)
|
Y
|
1 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
SURGICAL_YN
|
N
|
28,992 (99.98%)
|
2,527 (99.84%)
|
1,055 (100.00%)
|
161 (100.00%)
|
518 (100.00%)
|
Y
|
5 (0.02%)
|
4 (0.16%)
|
0 (0.00%)
|
0 (0.00%)
|
0 (0.00%)
|
TOBACCO_CURR_PACK_PER_DAY
|
0.00 (0.00–0.00)
|
0.00 (0.00–0.00)
|
0.00 (0.00–0.00)
|
0.00 (0.00–0.00)
|
0.00 (0.00–0.00)
|
TOBACCO_PACK_YEAR
|
0.00 (0.00–0.00)
|
0.00 (0.00–0.00)
|
0.00 (0.00–0.00)
|
0.00 (0.00–0.00)
|
0.00 (0.00–0.00)
|
tobacco_user
|
Never
|
19,665 (67.82%)
|
1,652 (65.27%)
|
825 (78.20%)
|
90 (55.90%)
|
289 (55.79%)
|
Not asked
|
1,437 (4.96%)
|
249 (9.84%)
|
18 (1.71%)
|
2 (1.24%)
|
106 (20.46%)
|
No response
|
5,149 (17.76%)
|
131 (5.18%)
|
12 (1.14%)
|
1 (0.62%)
|
12 (2.32%)
|
Passive
|
103 (0.36%)
|
9 (0.36%)
|
1 (0.09%)
|
0 (0.00%)
|
1 (0.19%)
|
Quit
|
1,485 (5.12%)
|
290 (11.46%)
|
128 (12.13%)
|
32 (19.88%)
|
66 (12.74%)
|
Yes
|
1,158 (3.99%)
|
200 (7.90%)
|
71 (6.73%)
|
36 (22.36%)
|
44 (8.49%)
|
UNKNOWN_FAM_HX_YN
|
N
|
7,225 (24.92%)
|
924 (36.51%)
|
579 (54.88%)
|
76 (47.20%)
|
260 (50.19%)
|
Not asked
|
21,759 (75.04%)
|
1,601 (63.26%)
|
474 (44.93%)
|
85 (52.80%)
|
258 (49.81%)
|
Y
|
13 (0.04%)
|
6 (0.24%)
|
2 (0.19%)
|
0 (0.00%)
|
0 (0.00%)
|
Abbreviation: ACG, Adjusted Clinical Group.
Fig. 2 Number of appointments and non–visit care interactions' distributions.
Table 3
ACG score category
ACG score
|
ACG score category
|
Count (%)
|
Risk description
|
<1
|
1
|
28,997 (87.18%)
|
Very low risk
|
1≤ and <2
|
2
|
2,531 (8.73%)
|
Low risk
|
2≤ and <3
|
3
|
1,055 (3.64%)
|
Moderate risk
|
3≤ and <4
|
4
|
161 (0.56%)
|
High risk
|
4≤ and ≤5
|
5
|
518 (1.79%)
|
Very high risk
|
Abbreviation: ACG, Adjusted Clinical Group.
To address missing values, we adopted specific strategies based on variable type.
For numeric variables, we filled missing values with zeros. For categorical variables
with missing values, we treated them as a distinct category, enabling us to retain
valuable information while accounting for the absence of data in those instances.
These approaches aimed to maintain the integrity of the dataset and ensure comprehensive
analysis for the predictive modeling tasks.
Model Training and Specification Details
The dataset was partitioned into training and test sets with randomized proportions
of 80 and 20%, respectively. The refinement of hyperparameters for all machine learning
models and the selection of features were exclusively performed within the training
set. Subsequently, final assessments were executed on the independent test set. To
fortify the study against potential biases and enhance robustness, the experimental
procedure was iterated 10 times, employing distinct random seeds for the generation
of train-test datasets.
Within the training set, a fivefold cross-validation methodology was adopted to determine
optimal hyperparameters and identify significant features. The specific hyperparameters
under consideration are detailed in [Table 4]. A systematic exploration was conducted to identify the top k important features, where k assumes values of 20, 50, 100, 150, and the total number of features. Implementation
of all models was performed using the scikit-learn package in Python.[23]
Table 4
Hyperparameter tuning
Machine learning model
|
Hyperparameter tuning
|
Random forest
|
n_estimators: [200, 500,1000], max_depth: [8, 16, 24, 32], criterion: ['entropy',
'gini']
|
SVM
|
C: [0.005, 0.01, 0.05, 0.1, 0.2], penalty: ['l1', 'l2']
|
Multinomial logistic regression
|
C: [0.005, 0.01, 0.05, 0.1, 0.2], penalty: ['l1', 'l2']
|
Decision tree
|
max_depth: [8, 16, 24, 32, 64], criterion: ['entropy','gini']
|
Gradient boosting
|
learning_rate: [0.1, 0.2, 0.5, 1.0], max_depth: [2, 4, 8, 16], n_estimators: [100,
200, 500]
|
Abbreviation: SVM, support vector machine.
Evaluation Metrics
For the prediction of ACG scores, a multi-class classification problem, we employed
the widely used classification metrics of accuracy and area under the receiver operating
characteristic curve (AUC). Accuracy measures the proportion of correctly classified
instances among all instances in the dataset, providing an overall assessment of the
classifier's performance. AUC quantifies the classifier's ability to discriminate
between classes by calculating the AUC curve. However, traditional AUC is designed
for binary classification tasks. To adapt to multi-class scenarios, modified versions
known as AUC, One-vs-Rest (AUC_ovr) and AUC, One-vs-One (AUC_ovo) are utilized. In
AUC_ovr, each class is treated as the positive class while the rest are grouped as
negatives, resulting in separate evaluations for each class, effectively transforming
the task into multiple binary classification problems. AUC_ovr calculates the AUC
curve for each binary problem and averages these scores across all classes. Conversely,
AUC_ovo entails training a binary classifier for every pair of classes, with each
classifier discerning between instances of the two paired classes. AUC_ovo computes
the AUC curve for each binary classifier and averages the scores from all pairwise
comparisons. Hence, AUC_ovr considers each class as positive against the rest, whereas
AUC_ovo evaluates classifiers based on pairwise class combinations. Both metrics offer
valuable insights into the performance of classifiers in multi-class classification
tasks.
For the prediction of the number of appointments and non–visit care interactions,
regression techniques were employed. The evaluation metric used is R
2, also known as the coefficient of determination. R
2 quantifies the proportion of the variance in the dependent variable that is explained
by the independent variables. Higher R
2 values indicate a better fit of the regression model to the data, reflecting its
predictability.
Results
Prediction of Adjusted Clinical Group Score
The results from different machine learning algorithms were evaluated and compared
([Table 5]). We conducted a comparison between the machine learning models and a baseline approach.
The baseline approach solely relied on age and gender information to predict ACG scores,
mirroring the conventional practice within our institution before our exploration
into leveraging SDOH for enhanced performance. In this baseline method, for each new
patient, the ACG was estimated based on the ACG scores from the existing patients
with the same gender and the closest age. The gradient boosting algorithm performed
better compared with other machine learning models in terms of AUC_ovr (82.8%) and
AUC_ovo (70.1%), while random forest achieved the highest accuracy (87.3%). Moreover,
machine learning models such as random forest, gradient boosting, SVM, and multinomial
logistic regression perform at least 9% better in accuracy than the baseline approach,
demonstrating the effectiveness and superiority of incorporating SDOH and machine
learning techniques in predicting ACG scores for new patients. The decision tree model,
however, exhibited only a slightly better performance than the baseline.
Table 5
Machine learning model results for ACG score prediction
|
Parameters
|
AUC_ovr (%)
|
AUC_ovo (%)
|
Accuracy (%)
|
Random forest
|
n_estimators = 200, max_depth = 16, criterion = ”gini”
|
80.77
|
68.15
|
87.31
|
SVM
|
C = 0.005, penalty = ”l2”
|
81.11
|
68.67
|
87.27
|
Multinomial logistic regression
|
C = 0.2, penalty = ”l2”
|
81.30
|
68.48
|
87.18
|
Decision tree
|
max_depth = 32, criterion = ”entrophy”
|
56.42
|
55.28
|
79.72
|
Gradient boosting
|
learning_rate = 0.1, max_depth = 2, n_estimators = 100
|
82.82
|
70.10
|
87.18
|
Baseline
|
–
|
–
|
–
|
78.05
|
Abbreviations: ACG, Adjusted Clinical Group; SVM, support vector machine.
The findings suggest that the machine learning models, especially random forest and
SVM, offer valuable insights for predicting ACG, surpassing the performance of the
baseline approach. To gain deeper insights into the classification performance of
each class, we constructed receiver operating characteristic curves for individual
classes using the one-versus-rest approach with the random forest model ([Fig. 3]). The majority of the AUC values consistently achieve approximately 0.8, indicating
robust classification ability across classes. However, when discerning between ACG
score category 2 and the rest, the corresponding AUC value is relatively lower. This
outcome implies an elevated challenge in accurately distinguishing low-risk cases
from others within the dataset.
Fig. 3 ROC curve for each ACG score class. ACG, Adjusted Clinical Group; ROC, receiver operating
characteristic.
Prediction of Number of Appointments and Nonvisit Care Interactions
Having achieved satisfactory prediction results for ACG scores, we proceeded to enhance
our prediction models by incorporating the predicted ACG score along with age, gender,
and SDOH information to predict the number of appointments and non–visit care interactions.
Among the 182 features examined, 28 features exhibited significance with p-values smaller than 0.05 for predicting the number of appointments, while 49 features
showed significance for predicting the number of non–visit care interactions. Only
these significant features were incorporated into the final models. The integration
of SDOH yielded notable improvements in model performance, as evident from the considerable
increase in R
2 values ([Table 6]). Specifically, the inclusion of SDOH led to a 71.3% increase in the R
2 value for predicting the number of appointments and a 65.6% increase for the model
predicting the number of non–visit care interactions. Additionally, the integration
of the predicted ACG scores also made a meaningful contribution to enhancing model
performance, albeit to a lesser extent. This observation can be attributed to the
fact that age, gender, and SDOH information are already considered when predicting
ACG scores. Nonetheless, incorporating the predicted ACG scores resulted in an additional
increase of 13.9 and 17.0% for predicting appointments and non–visit care interactions,
respectively.
Table 6
Linear regression result for providers' workload prediction
|
Feature types
|
R
2
|
Adjusted R
2
|
Appointments
|
Age, gender
|
0.227
|
0.227
|
Age, gender, SDOH
|
0.389 (↑71.3%)
|
0.387
|
Age, gender, SDOH, predicted ACG
|
0.443 (↑95.2%, ↑13.9%)
|
0.441
|
Non-visit care interactions
|
Age, gender
|
0.227
|
0.227
|
Age, gender, SDOH
|
0.376 (↑65.6%)
|
0.373
|
Age, gender, SDOH, predicted ACG
|
0.440 (↑93.8%, ↑17.0%)
|
0.438
|
Abbreviations: ACG, ACG, Adjusted Clinical Group; SDOH, social determinants of health.
Note: The percentage in parentheses represents the increase compared with the baseline
model using only age and gender as inputs.
Overall, our findings underscore the significant impact of incorporating SDOH and
predicted ACG scores in the prediction models, resulting in substantial improvements
in their predictability for both the number of appointments and non–visit care interactions.
These results offer valuable insights into optimizing health care resource allocation
and improving patient care and reducing clinician burnout.
Feature Importance
To gain deeper insights into the most influential factors affecting the prediction
tasks, we created two plots showcasing the top 10 absolute values of log(p-value) for prediction on number of appointments ([Fig. 4]) and non–visit care interactions ([Fig. 5]) with all the features as input. Larger values correspond to smaller p-values, and the use of the logarithmic function facilitates easier visualization,
considering that p-values may vary significantly, spanning several orders of magnitude.
Fig. 4 Top 10 important features of the number of appointments' prediction.
Fig. 5 Top 10 important features of the number of non–visit care interactions' prediction.
For the number of appointment prediction, age emerged as the most influential factor,
followed by variables related to smoking tobacco, patient's awareness of their family
history, gender, usage of injectable birth control, and predict ACG scores. The impact
of age on the prediction was expected, given its widely recognized significance as
a determinant of overall health conditions.[24] Additionally, tobacco use is a well-established risk factor,[25] further reinforcing its importance in the prediction. Literature also supports that
certain health conditions disproportionately affect men and women,[26] contributing to the relevance of gender in the prediction. The current literature
offers evidence indicating a strong association between injectable birth control and
a decline in bone density, a critical factor influencing the health condition of patients.[27] Additionally, the estimated ACG score holds significant importance in predicting
the number of appointments, indicating its relevance as a predictor in the prediction
model.
For predicting the number of non–visit care interactions, it is noteworthy that the
top 10 significant features were exclusively composed of SDOH variables. Factors such
as tobacco usage, engagement in physical exercise, level of education, and participation
in group activities appear strongly correlated with predictions of non–visit care
interactions. Moreover, while not among the top 10 features, the ACG score categories
of 1 and 2 also hold significance for providers' workload prediction in this case.
Patients with severe illnesses typically have fewer appointments with primary care
providers, often being directed toward specialists. Conversely, individuals with mild
health concerns are more likely to seek care from primary care providers, thus influencing
the number of non–visit care interactions. This correlation aligns with the commonly
observed patterns of health care utilization based on illness severity. Overall, these
findings offer valuable insights into the key factors influencing the provider workload
prediction task, providing a basis for further understanding to optimize health care
resource allocation and patient panel assignment.
Discussion
The inclusion of SDOH variables has, first, demonstrated a substantial enhancement
in the accuracy of ACG score prediction, indicating the relevance of social and environmental
factors in predicting patients' health status and risk. Second, the predicted ACG
scores have proven to be instrumental in forecasting the number of appointments as
well as exhibiting a high level of accuracy in predicting the number of non–visit
care interactions. This suggests that the ACG scores, in combination with other patient-specific
information, offer valuable insights into patients' expected health care resource
utilization. Lastly, the research underscores the pivotal role of SDOH variables in
predicting the number of appointment and non–visit care interaction predictions, with
a particularly strong impact observed in the non–visit care prediction.
Despite the promising findings, this study has several limitations that should be
acknowledged. The prediction models heavily rely on the accuracy and completeness
of the data used. Missing values in the dataset were addressed by filling numeric
variables with zeros and treating missing values in categorical variables as a new
category. While these strategies were adopted to maintain data integrity, they may
introduce bias and impact the model performance. Future studies should aim to incorporate
more sophisticated imputation methods to handle missing data effectively.
Our organization has successfully implemented quite a few automatic triage tools in
the specialty such as Cardiovascular and Pain Medicine departments. In the primary
care settings, the SDOH information is already collected in a form of digital questionnaire
at the beginning of each visit. Due to the time constraint, some patients are not
able to complete the questionnaire before they are called into the exam room, which
is the reason of many missing values. This study requires the SDOH information to
be collected between when the appointment is requested and when it is scheduled to
assign a PCP. The questionnaire link will be sent to patients and required to complete
in a timely manner before an appointment can be granted. When the SDOH information
is read into our electronic medical record, it will be sent to the server where our
models will be hosted along with age and gender. It then triggers the models to run
in sequence, the ACG predictive model first and then the appointments non–visit care
interactions' predictive models. The prediction results are then published back to
our electronic systems via the Application Programming Interface (API) and temporarily
stored on a platform. Subsequently, when a scheduler selects a new patient to schedule,
the scheduling system retrieves the predicted result from the platform and assigns
the provider who has the least predicted workload and availability.
Conclusion
This study offers significant insights into the realm of health care analytics, underscoring
the crucial role of SDOH and predicted ACG score in enhancing prediction for health
care utilization. By leveraging SDOH and predicted ACG scores, our study presents
a valuable tool for health care systems to balance provider workload through equitable
new patient assignments and mitigate burnout effectively. This research holds the
potential to guide health care systems in refining patient panel assignments, thereby
elevating the quality of care provided, and fostering a more sustainable environment
for providers. The findings contribute to the broader landscape of health care analytics,
offering practical implications to address workload disparities and alleviate the
pressing concern of provider burnout, ultimately working toward the advancement of
patient-centric and resilient health care practices.
Clinical Relevance Statement
Clinical Relevance Statement
This study provided evidence on the impact of social determinants of health for predicting
patient risk and medical needs to further assist practice with resource planning.
The appropriate planning could significantly improve provider workload distribution
and reduce burnout.
Multiple-Choice Questions
Multiple-Choice Questions
-
When considering age, gender, and SDOH, which ACG score category proves to be the
most challenging to predict?
-
ACG category 1
-
ACG category 2
-
ACG category 3
-
ACG category 4
Correct Answer: The correct answer is option b.
Explanation: The results outlined in the “Prediction of ACG Score” section reveal
that when distinguishing between ACG score category 2 and the remaining categories,
the associated area under the curve (AUC) value is relatively lower. This suggests
an increased difficulty in accurately discerning low-risk cases from others within
the dataset. From a medical standpoint, this observation aligns with intuition. Severely
ill or entirely healthy patients are typically straightforward to identify, whereas
patients experiencing mild discomfort pose a challenge in classification.
-
When predicting the number of appointments, which social determinant of health below
is deemed the most crucial?
-
Alcohol use
-
Tobacco use
-
Exercise
-
Education
Correct Answer: The correct answer is option b.
Explanation: The findings are detailed in the “Feature Importance” section, where
it is evident that the remaining three options do not rank within the top 10 important
features. Tobacco use, supported by well-established medical knowledge, emerges as
a significant risk factor for patients' health.