Keywords
Machine learning - radioiodine ablation - restratification - thyroglobulin - thyroid
cancer
Introduction
Thyroid cancer is one of the most common endocrine malignancies, and differentiated
thyroid cancer (DTC) comprises more than 90% of all thyroid carcinomas.[1] Greater than 85% of DTC cases are due to papillary thyroid cancer (PTC), making
them by far the most common type.[1] Standard treatment of DTC generally includes thyroidectomy followed by radioiodine
(RAI) ablation of remnants[2],[3] with iodine-131 (I-131) and subsequent initiation of thyroid hormone replacement
therapy. This general treatment approach usually results in successful disease remission
and most patients have an excellent prognosis. Treatment success is measured by undetectable
or significantly low serum thyroid tumor marker thyroglobulin (Tg) and anti-Tg antibodies.
In addition, disease remission usually includes an absence of iodine-concentrating
tissue seen on follow-up imaging.[1] Initial radioiodine dosing is selected based on the intent of therapy and generally,
this is for complete ablation of any remnant thyroid tissue.[2]
Yet, despite the generally high success rate in DTC treatment following thyroidectomy
and initial RAI, treatment failure can arise. A novel restratification of patients
in the management of PTC has been proposed based on the response to initial therapy
and this has been reported to have a better correlation with long-term outcomes.[3] The American Thyroid Association (ATA) initial Risk Stratification System was recommended
in 2009 based on its utility in predicting the risk of disease recurrence. Nevertheless,
some studies have found poor agreement between initial risk stratification and the
actual outcome after evaluation in response to treatment.[2],[3] There is a relative paucity of published data which correlates risk factors predicting
failure of initial therapy and the likelihood of disease recurrence.[4] Given the high prevalence of DTC[5] and the possibility of risk for negative outcomes even after recommended treatment,
a study to evaluate the clinical factors predictive of failure of initial radioiodine
therapy in thyroid cancer patients is warranted. We report here our experience with
RAI therapy for DTC in patients treated over an approximately 10-year period. We performed
a retrospective review of patients with DTC who underwent surgical resection and initial
dosing with I-131 for radioiodine ablation in our clinic. Patients who had persistent
disease either indicated by biochemical serum markers or by evidence of disease on
follow-up imaging after the initial dose of radioiodine were considered to have failed
initial treatment. By pooling the electronic medical-record data, we looked at patient
clinical features and examined their relationship to initial RAI treatment failure.
Materials and Methods
This is a retrospective single-center study conducted at our institution which was
sent to the Institutional Review Board (IRB submission 1556801-1), reviewed and found
exempt from IRB review (exemption category #4 [ii]) and subsequently approved under
de-identification. We performed a search in our electronic medical record system (Epic)
for all patients with a diagnosis of DTC established by histopathology who underwent
near-total or total thyroidectomy and subsequent I-131 therapy for the first time
from November 2009 to January 2020 in the Nuclear Medicine Department at our institution.
We excluded those patients who did not have follow-up diagnostic imaging (radioiodine
whole-body scan, positron emission tomography-computed tomography [PET/CT]) and/or
Tg and anti-Tg antibody serum markers as we would not be able to determine their response
to initial therapy. As per our institutional protocol, patients scheduled for RAI
were placed on a low-iodine diet 2 weeks before the treatment date. Subsequently,
on the 2 days preceding treatment, intramuscular injections of 0.9 mg recombinant
thyroid stimulating hormone (TSH) (thyrogen) each were administered. In these cases,
the serum TSH level was not measured explicitly as it was assumed to be adequately
elevated for treatment. On rare occasions, the referring physician would request that
the patient only have their replacement thyroid hormone withheld 4–6 weeks before
treatment and then a serum TSH level was measured to be sure it was above at least
30 u (IU)/mL before RAI dosing and treatment. Both Tg and anti-Tg antibodies were
measured at the time of RAI or at follow-up and most often this was done following
stimulation with recombinant TSH.
After review of the medical records, we identified 107 adult patients who met the
inclusion criteria. Patients ranged in age from 18 to 79 years old and n = 73 female, n = 34 male [[Table 1]]. Sixteen patient variables were included in our study: age at the time of diagnosis,
gender, previous thyroid disease, type of thyroidectomy, extent of disease and lymph
node involvement at surgery, histopathological variant, size of primary tumor, multifocality,
I-131 dose administered, time from surgery to RAI therapy, pre-RAI and post-RAI Tg
and anti-Tg antibody serum levels respectively, and time from RAI to follow-up. The
outcome was clinical response to RAI therapy. Treatment failure is defined as persistent
abnormally elevated Tg values (biochemical incomplete response to therapy), and persistent
loco-regional or distant metastasis identified by follow-up imaging such as on a radioiodine
scan or 18F-fluorodeoxyglucose (FDG) PET/CT scans (anatomic incomplete response). Of the 107
patients included in our study, 46 had treatment failure following initial RAI. Incomplete
response to therapy, the nomenclature used in the new ATA guidelines, is used interchangeably
with treatment failure.
Table 1: Patient demographics
We utilized a multivariable logistical regression (machine learning) analysis program
in Python (v 2.7). Using the popular set of machine learning libraries in Python with
Sklearn (scikit-learn, Python) we looked at features from our tabulated clinical dataset
(Excel, Microsoft Office 2019) which were significantly associated with resistance
to RAI treatment.[6],[7],[8] We used a random forest classifier and then performed a test for significance to
obtain P values from the various contribution each variable had on whether or not
the patient failed therapy [[Table 2]]. Our model was cross-validated and checked by data-shuffling. For the categorical
data, a random forest classifier was utilized and with the noncontinuous nature of
the data, we implemented ANOVA F-values to look for the importance of each feature
within the overall classification scheme.[9] Of those clinical features closely associated with treatment failure, a cutoff P
value of 0.05 was considered to be statistically significant.
Table 2: Clinical features and associations with biochemical treatment failure post-RAI
Results
A total of 107 patients met the criteria to be included in this study, with an average
age of 49.8 (±17.2) years old (ranging from 18 to 79 years old). Of these, 46 patients
had treatment failure after the first dose of radioiodine as defined by biochemical
serum markers and/or on imaging [[Table 1]]. The mean time from RAI to follow-up was 15.2 (±5.5) months and the mean dose of
radioiodine prescribed was 104.9 (±53.4) mCi [[Table 1]]. Using a machine-learning algorithm from Sklearn we analyzed the clinical dataset
to discern which factors were associated with treatment resistance.
First, following surgery but before RAI, if the patient was found to have elevated
serum levels of Tg and anti-Tg antibodies, then this was more likely to be associated
with treatment failure (P = 0.011) with a relative risk (RR) of 1.82 [[Table 2]] and [Figure 1]. Other associated factors with clinical treatment resistance include multifocal
disease involvement within the gland (P = 0.026, RR = 1.73) and advanced stage of disease presentation at surgical resection
(lymph node involvement, P = 0.0135, RR = 1.91), both of which were found to be significantly associated with
treatment failure [[Table 2]] and [Figure 1]. Finally, a prescribed dose of I-131 over 160 mCi was associated with treatment
failure (P = 0.0147, RR = 2.12) [[Table 2]] and [Figure 1]. Since the prescribed dose usually follows from the clinical stage or presentation
at surgery, I-131 doses higher than 50–100 mCi usually reflect a more advanced disease
state.[2] Taken together, these patient profile features reflect a more aggressive subset
of tumor types and thus a higher likelihood for treatment failure. Interestingly,
we also observed that if the patient was administered radioiodine at or within 2 months
of the primary surgical treatment of the disease, there were slightly more patients
who went on to develop treatment failure with RAI than those who did not (total number
who failed treatment by 2 months; n = 25 versus total number with successful treatment; n = 24). This relationship however was not statistically significant [P = 0.064, [Figure 1] and [Figure 2]]. This is in contrast to a similar (but not the same) type of study which demonstrated
increased rates of successful RAI on patients who waited 1 month or less between surgery
and RAI.[10]
Figure 1: Relative risk of initial radioiodine treatment failure associated with specific
patient clinical features. Whiskers denote the 95% confidence interval
Figure 2: The number of cases of initial radioiodine treatment failure (dark gray
bars) and successful RAI treatment (light gray bars) following thyroidectomy. Number
of cases of each clinical response are displayed according to the number of months
between surgical resection and RAI treatment
We looked at factors affecting treatment failure which were determined by imaging
as well as from biochemical evidence. The number of patients with metastasis seen
on whole-body imaging at follow-up (planar whole-body postradioiodine scans or by
18F-FDG PET/CT) with or without biochemical evidence of post-RAI treatment failure was
lower than those with primarily biochemical evidence of treatment failure as shown
in [[Table 1]] (n = 11 with treatment failure evidence primarily by imaging versus n = 35 with primary biochemical evidence of treatment resistance from the total of
107 patients).
Discussion
A recent article outlining the importance of patient risk restratification and the
utility it has for patient management describes a risk stratification system based
on the 8th Edition of the American Joint Committee on Cancer tumor node metastases
(TNM) staging system.[11] This system would better reflect the biological nature of thyroid cancer with the
most important prognostic risk factors being the age at diagnosis, presence of distant
metastasis, and extrathyroid extension. There is increasing clinical support for a
patient risk restratification scheme with RAI treatment which better captures the
patient-specific disease state and is reflected in improved outcomes.[11] In light of these proposed patient restratification systems, the factors we determined
for treatment resistance can be utilized clinically.
The results of our study shed light on important clinical features when stratifying
patients to receive therapy with RAI. Some of the clinical variables which we found
associated with initial treatment failure agree with the previous literature looking
at patient factors with RAI treatment failure.[1],[3],[12] In these studies, disease multifocality (diagnosed by surgical pathology) and large
tumor size (>1 cm) at surgical resection were statistically significantly associated
with treatment resistance such as was seen in a recent retrospective previous study
looking at RAI treatment outcomes within a Filipino patient population.[3],[13] This study however only looked at patients in a specific ethnic cohort. Although
our patient cohort was smaller in number than this study, it was not limited to a
specific ethnic cross section of the public. However, similar to this prior work,
we also found that tumors which had spread to involve lymph nodes initially at the
time of surgery were more likely to be implicated in RAI treatment resistance. These
cases likely involve disease with a more aggressive initial profile and thus would
be harder to treat using only a single dose of I-131. Interestingly, in a previous
study, there was no reduction in disease recurrence when RAI was added to the treatment
protocol in treating patients who had initial multifocal disease.[14]
Another similar study, matching closer in method to ours, looked at treatment outcomes
in DTC following RAI, using multivariate analysis to assess independent risk factors
for treatment resistance.[10] The authors found that age ≥45 years old, tumor size ≥2 cm, and a multiple number
of nodules with disease (multiple foci of disease carried a higher risk) and more
advanced TNM Stage (III-IV) all of which were associated with a statistically significant
resistance to RAI ablation (P < 0.05).[10] This study however differs from ours in that they grouped patients into those that
received 2, 3, or 4 consecutive doses of I-131, whereas we looked only at first-time
treatment failure with RAI. The risks identified for treatment resistance in this
previous study which were in accordance with the results of our study were multifocality
of tumor at surgical resection and local tumor invasion (reflected in a higher TNM
stage). Unlike the study by Cao et al., in our study age was not statistically significantly
associated with resistance, however nearly all cases of resistance in our patient
cohort showed a tumor size >0.6 cm (0.6–5 cm, n = 104/107) similar to findings from these other previous studies.
In addition, similar to these studies we found that a greater initial tumor size and
patients who were given a higher initial RAI dose (prescribed for a higher clinical
stage) were both associated with treatment failure [[Table 2]] also likely reflecting a more aggressive biological disease profile. Specifically,
patients who were given a larger dose of I-131 initially reflected a more aggressive
or advanced disease profile and showed a propensity towards treatment failure [P =
0.0147, RR = 2.12 [[Table 2]] and [Figure 1]]. Along these lines, it is not surprising that an association was seen with patients
who failed initial treatment and those who had elevated serum anti-Tg antibodies denoting
residual disease [[Table 2]].
Of the patients with recurrent disease primarily by imaging at follow-up, a slightly
greater number had elevated postsurgery (pre-RAI) Tg antibody levels (n = 6 elevated
Tg levels vs. n = 4 normal serum levels). Of these patients with treatment failure assessed only
by imaging, two patients had features of initially aggressive disease as depicted
in the 2015 ATA guidelines.[2] One patient had treatment failure with normal post-RAI Tg, serum levels however
the post-RAI Tg antibody levels were elevated and on follow-up imaging, there were
supraclavicular, axillary, and mediastinal lymph node metastases which showed activity
on 18F-FDG PET/CT exam. The primary tumor was large (4 cm) and multifocal involvement at
surgery as well as extra-capsular extension and metastatic lymph node involvement
was seen. Many of the FDG-avid lymph nodes which were not seen on post-RAI whole-body
planar images however were confirmed as metastases on biopsy. In addition, this patient
had a scapular osseous metastasis which was not seen on the whole-body post-RAI scans
but was identified first on plain radiographs. This also turned out to be metastatic
disease on biopsy. The second patient also showed a large multifocal primary tumor
(size-4 cm) with angiovascular invasion at surgery. Post-RAI Tg and Tg antibody levels
were normal, however a follow-up 18F-FDG PET/CT showed indeterminant esophageal activity, no biopsy was performed however
in this case as it was felt to be unnecessary.
A finding which was surprising and novel to our knowledge was that a shorter time
(<8 weeks) from surgery to RAI treatment was weakly associated with treatment failure
[Figure 2]. Although not statistically significant, we thought that this may reflect a suboptimal
postoperative state for administering radioiodine therapy despite stimulation with
rhTSH. An inflammatory environment or state of initial postsurgical healing may hamper
the effective utilization of radioiodine therapy within the thyroid bed. These results
are in contrast to those from a study by Cao et al. which found more patients had
successful ablation at 1 month or less following surgery. Instead, we found a slightly
higher number of patients with first-time treatment failure at 1 month or less following
surgery. Successful remnant ablation was defined by the authors of that previous study
as undetectable Tg or absence of disease evidence on follow-up radioiodine scans.[10] One possible explanation for this discrepancy between their study and ours is that
in this previous study patients were given two or more doses of RAI, and repeated
I-131 given in rapid succession within a month may have a boosted effect of accumulated
cell damage from repeated radioiodine in a short interval of time. Finally, we explore
the possible explanation that there could be a component of patient selection bias
affecting our results such that more advanced or aggressive cases were scheduled for
RAI sooner following surgery.
We compared our results from the machine learning algorithm with the commercially
available statistics software package SPSS Statistics from IBM (version 27, 2020).
We found similar results in terms of which clinical features were significantly associated
with resistance to treatment [[Table 3]]. These included tumor extent (P = 0.037) and tumor focality (multifocal P=0.042) at surgery both of which were associated
with resistance to RAI. Additionally, there were differences between what SPSS and
the machine learning program picked out as being associated with treatment resistance.
The pre-RAI Tg and Tg antibody levels as well as the I-131 dose administered were
found to be associated with treatment resistance only with the machine learning analysis
[[Table 3]]. Part of the difference in the output of these two programs may be attributable
to the methodologies used by each one. Although beyond the scope of this article,
the methods used by Sklearn involve random forest tree classifiers whereas, the multivariate
regression analysis by SPSS uses a method of ordinary least squares to find the contribution
of each variable to the overall model fit of the data. In our dataset, the relationships
between variables may be better depicted using one method versus the other one.
Table 3: Clinical features and associations with post-RAI treatment failure comparison
using multivariate logistical regression with IBM SPSS Statistics 27 versus a machine
learning algorithm implemented in Sklearn from Python v2.7.
Some of the limitations of our study are that it is a retrospective study without
matched controls. Only patients with follow-up serum Tg and anti-Tg levels and/or
radioiodine imaging were included which may lead to a selection bias in our cohort.
Our patient study size was modest (n = 107) since this was a single-center study and
could be subject to sample-size limitations as well as institution-specific protocol
effects such as selection and/or recall bias. The study size also could have effects
on the machine learning algorithm as the training set would be smaller and more prone
to data overfitting. We tried to use the simplest decision tree model algorithm as
possible to work with our smaller dataset.[9] To help further address these issues, we implemented data-shuffling and cross validation
to ensure that the algorithm was trained and tested in a robust fashion.[9] Finally, the machine learning algorithm we used from Sklearn has not been validated
on a clinical dataset taken from the medical record like the one used in this study.
Conclusion
Identifying factors associated with reduced treatment efficacy is paramount in improving
the delivery of clinical care to patients for radioiodine therapy. This type of study
is important in addressing some of the shortcomings in the management of patients
with DTC following surgical resection. By utilizing a machine learning multivariate
data analysis technique, we found relevant clinical variables which may help restratify
patients who are more resistant to initial RAI therapy. With this in mind, better
management of these patients in the postoperative state can be realized. By implementing
these results, improved clinical outcomes and better quality of life for patients
treated with RAI can be achieved.