Introduction
Barrett’s-related early esophageal adenocarcinoma (T1 EAC), which is localized to
the mucosa (T1a) or submucosa (T1b), constitutes approximately 20 % of all diagnosed
EAC [1 ]. While esophagectomy has been the standard of care, achieving a 5-year survival
up to 90 % for T1a disease [2 ], this carries significant perioperative mortality (2 %) and morbidity (10 %), even
in high-volume centers [3 ]. Moreover, many patients suffer from long-term irreversible digestive dysfunction,
which significantly impairs quality of life, including medically refractory reflux
(60 %), dumping syndromes (50 %), and dysphagia to solid food (25 %) [4 ].
In recent years, endoscopic resection (ER) for Barrett’s neoplasia has yielded excellent
long-term outcomes that are comparable to surgical resection [3 ]
[5 ]
[6 ]
[7 ]
[8 ]. Endoscopic mucosal resection (EMR) is firmly established as the treatment of choice
for nodular high grade dysplasia (HGD) and T1a disease [8 ]
[9 ]. Similarly, endoscopic submucosal dissection (ESD) and en bloc EMR are favored for
T1b disease because an en bloc R0 excision may facilitate a cure due to the resection
plane being at the level of the muscularis propria. In addition, en bloc excision
enables the preservation of important biologic features in the resected specimen that
predict lymph node metastasis, such as tumor differentiation and lymphovascular invasion
[10 ]
[11 ]
[12 ]
[13 ]. Furthermore, it does not preclude or compromise subsequent surgery should more
advanced histopathology be detected. There is also emerging evidence to suggest that
high risk T1b cases (deep submucosal [SM2 +] and/or poorly differentiated and/or with
lymphovascular invasion) may be suitable for close surveillance following an en bloc
R0 excision without compromising the oncologic outcomes of further treatment such
as surgery in the event of disease progression [14 ].
Thus, distinguishing between T1a and T1b disease is imperative under current treatment
paradigms. Current assessment is limited to optical evaluation, as biopsies and endoscopic
ultrasound are inaccurate for T staging in this setting [15 ]
[16 ]. Thus, we sought to ascertain whether expert Barrett’s endoscopists could distinguish
between T1a and T1b EAC based on optical evaluation.
Methods
Study design and case selection
We retrospectively obtained high quality endoscopic images and pathology reports from
patients who underwent either EMR or ESD for Barrett’s neoplasia over 36 months until
October 2021 at Westmead Hospital in Sydney, Australia. All patients had previously
provided informed consent for research purposes as part of a prospective registry.
Approval for the current study was granted by the Human Research Ethics Committee
at Westmead Hospital (2021 /ETH01154). Exclusion criteria included age < 18 years.
All ER procedures were performed using an Olympus HQ190 adult gastroscope with optical
capabilities including high definition white light, narrow-band imaging (NBI), and
near-focus magnification (Olympus Medical System Corp., Tokyo, Japan).
A total of 60 sets of endoscopic images of histologically confirmed HGD, T1a, and
T1b disease (20 sets for each) were compiled from consecutive patients over the previously
defined time period. Each set contained four images, and were standardized to include
an overview, a close-up in high definition white light, an NBI, and a near-focus magnification
image. The Paris classification of each lesion and Prague classification of the extent
of Barrett’s esophagus were provided in each case. Using this information, an online
survey was created on Research Electronic Data Capture (REDCap), hosted by the University
of Sydney [17 ]
[18 ]. REDCap is a secure, web-based application designed to support data capture for
research studies, providing: 1) an intuitive interface for validated data capture;
2) audit trails for tracking data manipulation and export procedures; 3) automated
export procedures for seamless data
downloads to common statistical packages; and 4) procedures for data integration and
interoperability with external sources.
International Working Group
A total of 24 expert Barrett’s endoscopists from around the world were invited to
participate, with 19 completing the survey. Experts were defined as individuals performing
Barrett’s endotherapy (> 5 years) with a high resection case volume (> 50 per year)
in the field of ER for Barrett’s neoplasia. The initial invitation was sent to experts
via email communication by the Principal Investigator through REDCap. A reminder was
sent after 4 weeks for cases of non-completion. Experts were required to assess each
of the 60 sets of endoscopic images using the REDCap-based survey (see Fig. 1 s and Fig. 2 s in the online-only Supplementary material). Each participant was asked to predict
the histology of Barrett’s neoplasia for each lesion: HGD, intramucosal adenocarcinoma
(T1a), or submucosal invasive adenocarcinoma (T1b). Participants were also asked to
indicate their level of confidence in the diagnosis (high or low confidence).
Outcomes
The primary outcomes of the study were the sensitivity of optical evaluation in identifying
T1b disease compared with T1a disease, and the interobserver agreement between experts.
The secondary outcome was identification of potential endoscopist-related factors
contributing to accurate optical evaluation.
Statistical analysis
Given that multiple experts rated the same endoscopic data sets, observations within
each set may be correlated (clustered data). To account for this, variance adjustment
using a variance influence factor was performed [19 ]. The variance influence factor was calculated by determining: 1) the overall cluster
size; and 2) intraclass coefficients, which represent the resemblance between any
two observations within a cluster.
Sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio
(NLR), and accuracy were calculated, along with their variance-adjusted 95 %CIs. Positive
and negative predictive values were not calculated as these are dependent upon disease
prevalence, which was artificially determined in this study by selecting 20 cases
with HGD, T1a disease, and T1b disease. Chi-squared tests were used to test for pairwise
association between categorical variables. Interobserver agreement between experts
was calculated using Fleiss’ kappa statistics and their 95 %CIs, along with respective
P values. A modified Likert scale developed by Landis and Koch was used to interpret
kappa values (< 0.20 = poor; 0.21–0.40 = fair; 0.41–0.60 = moderate; 0.61–0.80 = substantial;
0.81–1.00 = very good) [20 ]. Statistical significance was defined as P < 0.05 using two-tailed P values. All statistical analyses were
performed using SPSS software version 29 (IBM Corp., Armonk, New York, USA).
Results
A total of 19 expert Barrett’s endoscopists from 8 countries (Australia, USA, Italy,
Netherlands, Germany, Canada, Belgium, and Portugal) participated. The majority had
been practicing for more than 20 years (n = 12, 63.2 %), and actively reviewed over
100 cases of Barrett’s esophagus each year (n = 10, 52.6 %). The median annual case
volume was 50 (interquartile range [IQR] 28–90) for Barrett’s radiofrequency ablation,
50 (IQR 18–75) for Barrett’s EMR, and 25 (IQR 10–45) for Barrett’s ESD. The overall
quality of images was graded by experts as excellent (n = 6), good (n = 10), and average
(n = 3); none considered the images to be of below average or poor quality. All experts
rated each of the 60 sets of images. The pooled results of the responses provided
by the experts are shown in Table 1 s and Table 2 s .
EAC (T1a/b) could be distinguished from HGD with a pooled sensitivity of 89.1 % (95 %CI
84.7–93.4) ([Fig. 1 ]). Pooled specificity was 48.4 % (95 %CI 35.3–61.5), PLR was 1.73 (95 %CI 1.31–2.42),
NLR was 0.23 (95 %CI 0.11–0.43), and accuracy was 75.5 % (95 %CI 68.2–82.8). Responses
to individual cases are presented in [Fig. 2 ]. When predicting the T stage for T1b vs. T1a disease, the pooled sensitivity was
43.8 % (95 %CI 29.9–57.7). Pooled specificity was 59.4 % (95 %CI 46.7–72.1), PLR was
1.08 (95 %CI 0.56–2.07), NLR was 0.95 (95 %CI 0.59–1.50), and accuracy was 51.4 %
(95 %CI 38.3–64.9). Responses to individual cases are presented in [Fig. 3 ]. Examples of cases where T1a and T1b lesions were identified correctly and incorrectly
by most experts are presented in [Fig. 4 ] and [Fig. 5 ].
Fig. 1 Sensitivity and specificity of optical evaluation for Barrett’s neoplasia and esophageal
adenocarcinoma (19 expert Barrett’s endoscopists).
Fig. 2 Predicted histology (T1 cancer versus high grade dysplasia) by 19 expert Barrett’s
endoscopists, for each case of confirmed T1 cancer, yielding a sensitivity of 89.1 %.
Fig. 3 Predicted histology (T1b versus T1a cancer) by 19 expert Barrett’s endoscopists, for
each case of confirmed T1a/T1b cancer, yielding a sensitivity of 43.8 %.
Fig. 4 Two examples of T1a disease: a identified correctly by 13/19 experts; b identified correctly by only 4/19 experts.
Fig. 5 Two examples of T1b disease: a identified correctly by 14 /19 experts; b identified correctly by only 1 /19 experts.
Of the 20 T1b cases, 12 were SM1, 3 were SM2, and 5 were SM3. When stratifying disease
as potentially low risk (T1a/T1b-SM1) vs. high risk (T1b-SM2 /3) (Table 3 s ), predicting the T stage for high risk disease carried a pooled sensitivity of 48.2 %
(95 %CI 24.1–72.4). Pooled specificity was 59.3 % (95 %CI 49.4–69.3), PLR was 1.19
(95 %CI 0.48–2.35), NLR was 0.87 (95 %CI 0.40–1.54), and accuracy was 57.0 % (95 %CI
44.3–69.9).
Overall interobserver agreement by Fleiss’ kappa was 0.326 (95 %CI 0.311–0.342; P < 0.001). When comparing T1a/T1b disease with HGD, Fleiss’ kappa was 0.394 (95 %CI
0.373–0.416; P < 0.001). When comparing T1b with T1a disease, Fleiss’ kappa was 0.421 (95 %CI 0.399–0.442;
P < 0.001). There was no association between the sensitivity of optical evaluation
and Barrett’s case volume (P = 0.39) or individual confidence (high or low) in predicting histology (P = 0.68).
Discussion
In our study, expert Barrett’s endoscopists reliably detected and distinguished early
EAC (T1a/T1b disease) from HGD, with a pooled sensitivity of 89.1 %. However, determining
the presence of submucosal invasion (i. e. distinguishing T1b from T1a adenocarcinoma)
was challenging. Although there was fair-to-moderate interobserver agreement, pooled
sensitivity was only 43.8 %. Thus, there remains a significant risk of understaging
T1b disease based on optical evaluation. Given that current guidelines consider piecemeal
EMR as an acceptable treatment modality for T1a disease but suggest that low risk
T1b disease may be endoscopically cured following an en bloc R0 excision, our study
has implications on clinical decision making and the selection of endoscopic treatment
methods.
For early EAC, it is well reported that ER offers a disease-specific survival rate
comparable to surgery, but with fewer adverse events, shorter hospital stays, fewer
readmissions, and a lower 90-day mortality [21 ]
[22 ]. However, the choice between EMR and ESD is less clear. The 2022 European Society
of Gastrointestinal Endoscopy guidelines recommend ESD for suspected T1b lesions and
for “malignant lesions” > 20 mm, and EMR for lesions ≤ 20 mm with a low probability
of submucosal invasion [23 ]. Similarly, the 2023 American Society for Gastrointestinal Endoscopy guidelines
recommend that for suspected nonulcerated T1 cases, ESD should be performed if the
lesion is “bulky” or > 20 mm, and either EMR or ESD for lesions ≤ 20 mm [24 ]. These definitions heavily rely upon the endoscopist’s interpretation of a lesion.
Furthermore, factors such as
“bulkiness” (indicating a nodular Paris 0-Is component) and ulceration (indicating
a possible Paris 0-IIc component) allude to the presence of submucosal invasion. Thus,
at its core, choosing the most ideal resection modality hinges upon prior knowledge
of histology (i. e. T1a vs. T1b disease). However, as we have shown in the present
study, even among expert Barrett’s endoscopists, optical evaluation cannot facilitate
such a distinction, with 56.2 % of T1b cases predicted as T1a.
This study also confirms that pre-resection staging of Barrett’s-related lesions is
complex. A few studies have previously assessed lesion morphology in relation to the
risk of T1b disease. In a retrospective study of 293 consecutive ERs at a Dutch tertiary
center, lesions classified as Paris 0-Is and 0-IIc morphology were more commonly associated
with T1b than T1a disease (26 % and 25 %, respectively) [25 ]. Similarly, in a study of 141 pathologically confirmed cases of early EAC from the
Cancer Institute Hospital in Tokyo, a complex-type morphology (Paris 0-Is + 0-IIa/IIc/IIb)
was associated with a higher incidence of T1b disease than a simple morphology (0-Is,
0-IIa, 0-IIb, or 0-IIc) (59.6 % vs. 22.5 %) [26 ]. As these studies demonstrate, lesion morphology remains imperfect and inadequate
for directing the ER treatment algorithm [27 ]. NBI is one of the most extensively studied
tools for characterizing early neoplasia in patients undergoing surveillance for Barrett’s
esophagus [28 ]
[29 ]. Recently, the BING Working Group, comprising experts from Europe, the USA, and
Japan, validated a consensus-driven NBI classification system for identifying dysplasia
and cancer in Barrett’s esophagus, with high accuracy and specificity [30 ]. However, no validated NBI or equivalent virtual chromoendoscopy classification
on other endoscopy platforms exist for distinguishing between T1a and T1b disease.
It is therefore unsurprising that despite providing experts with NBI images and information
on lesion morphology, our study yielded a low sensitivity for identifying T1b disease.
Data from old surgical series report the risk of lymph node metastasis in T1b cases
as between 27 % and 44 % [31 ]
[32 ]
[33 ]
[34 ]. In contrast, recent endoscopy studies have yielded rates of 2 %–4 % in low risk
T1b disease (well-to-moderately differentiated, without lymphovascular invasion, and
< 500 µm invasion into the submucosa) [35 ]
[36 ]. This may readily avoid the need for subsequent chemotherapy, radiotherapy, or surgery
in the majority of cases. Furthermore, there is emerging evidence from an ongoing
international trial that the risk of lymph node metastasis following an R0 endoscopic
excision of high risk T1b lesions (poorly differentiated, and/or lymphovascular invasion,
and/or > 500 µm invasion into the submucosa) may be as low as 5 % at a median of 19
months [36 ]. These incidences are comparable to the mortality risk after esophagectomy (2 %)
in expert centers [37 ]. Therefore, a less invasive and organ-preserving approach may be appropriate not
only for the frail and elderly, but for many patients with low or high risk T1b disease.
The caveat to these outcomes is that an en bloc R0 excision is mandated, rendering
it crucial to carefully select between piecemeal and en bloc resection techniques.
When stratifying disease as potentially low risk (T1a/T1b-SM1) vs. high risk (T1b-SM2/3),
sensitivity of optical evaluation in our study only increased from 43.8 % to 48.2 %,
again indicating that T staging in early EAC is challenging. Despite fair-to-moderate
interobserver agreement among experts in identifying T1b disease, sensitivity was
poor. Therefore, to prevent the inadvertent piecemeal resection of T1b disease that
may otherwise have been cured or surveilled following an R0 excision, we
suggest that an en bloc resection strategy (ESD or en bloc EMR) be considered for
all suspected T1a lesions and T1b lesions.
We found no correlation between the individual confidence (high or low) of experts
in predicting histology (P = 0.68), suggesting that confidence may more closely reflect subjective perception
rather than objective diagnostic ability. Moreover, there was no correlation between
individual sensitivity of T1b detection and the annual case volume (P = 0.39). It naturally follows that additional advanced technologies may be required
to improve endoscopic assessment. Short-wavelength endoscopy is a technology that
is able to visualize mucosal architecture to a depth of 200 µm and has shown promise
in dysplasia characterization, with higher sensitivity compared with high definition
white light (88.1 % vs. 73.4 %) [38 ]. However, its ability to distinguish T1a from T1b disease remains unclear. Recently,
Ebigbo et al. reported on the innovative application of artificial intelligence (AI)
for differentiating between T1a and T1b
Barrett’s-related cancers. The authors developed, trained, and tested a convolutional
neural network to estimate the risk of submucosal invasion using high definition white
light imaging. The AI system achieved a sensitivity, specificity, and accuracy of
77 %, 64 %, and 71 %, respectively. Although sensitivity appeared to be higher than
in our study, when AI was compared with the optical performance of five international
experts on the same dataset, there was no significant difference [39 ]. Nonetheless, AI is promising and has the potential to become a key tool for guiding
the ER algorithm. Until such a time, it may be prudent to consider en bloc resection
as the default strategy for all suspected T1 lesions.
We recognize that our study has some limitations. Optical evaluation relied on still
images, which inherently lack the depth and detail obtained during live procedures.
Although the quality of these images varied, they were graded by experts who deemed
the majority to be good or excellent. Furthermore, the study did not include images
obtained through retroflexion, nor could they account for dynamic changes that might
occur during live endoscopy, such as those induced by suction. This is the primary
reason why experts were not asked to suggest a resection modality for each case. Furthermore,
such responses would also have been biased from prior responses pertaining to predicting
histology.
Optical evaluation remains a critical step in determining the appropriate ER strategy
for Barrett’s-related neoplasia. In our study, international experts were proficient
in differentiating cancers from HGD, yet predicting submucosal invasion remained a
challenge. Based on these results, and the potential to obtain a cure with an R0 excision
of T1b disease, en bloc ER (ESD or en bloc EMR) should be considered for any suspected
T1a or T1b lesion. Future studies will be crucial in assessing the potential role
of advanced technologies, such as AI, in submucosal invasion prediction. Additionally,
new data on the outcomes of T1a and T1b resections will further enrich the post-ER
management algorithm in the future.