Keywords
breast cancer - digital breast tomosynthesis - independant double reading - screening
Introduction
The aim of breast cancer screening is to reduce the number of cases that progress
to an advanced tumor stage through earlier diagnosis, thus enabling therapeutic benefits
and reducing breast cancer-specific mortality [1]. Mammography is an evidence-based method of systematic screening which has been
proven to lower the rate of breast cancer mortality [2]
[3]. In Germany, a mammography screening program (MSP) based on the European guidelines
has been introduced nationwide since 2005. The recommendations set out in the European
guidelines include independent double reading of the mammograms, performed at different
times and different locations, so as to increase the sensitivity by 5–15 % [4]; this is mandatory in the German MSP [5].
Digital breast tomosynthesis (DBT) reduces tissue overlap by moving the X-ray tube
in an arc over the breast and reconstructing a pseudo 3D examination from the captured
parallel layers; this results in higher breast cancer detection rates than digital
mammography (DM), which is the current standard in population-based screening [6]. The randomized controlled TOSYMA study, conducted as part of the ongoing German
mammography screening program, showed that the test arm with DBT plus synthetic mammography
(SM) had a statistically significant higher rate of detecting invasive breast cancer
compared to the control arm with DM [7]
[8]. An independent, i. e., blinded, double reading of the mammograms was performed
by the same qualified examiners in both study arms. Performing an independent double
reading requires an investment of medical resources, especially for screening with
DBT which has a higher median time per reading than screening with DM, at 109 seconds
compared to 54 seconds [8]; the integration in screening therefore needs to be justified.
Due to the high structural equality of both study arms, the randomized TOSYMA study
provides a valid basis for assessing the influence of an independent double reading
on breast cancer detection with digital mammography compared to digital breast tomosynthesis
[8].
The aim of this TOSYMA subanalysis is to compare the two study arms with regard to
the proportion of discordant readings, i. e., cases in which only one of the two independent
double readings led to a true-positive finding, and to characterize the breast cancers
that are detected in this way.
Materials and Methods
Study Design
Phase 1 of the multicentric TOSYMA study was conducted from July 2018 to December
2020 at 17 screening units in the federal states of North Rhine-Westphalia and Lower
Saxony. In this study, 99,689 women were randomized 1:1 to either the study arm (DBT+SM)
or the control arm (DM). The study protocol was approved by the responsible ethics
committee (2016–132-f-S) and reviewed by two other ethics committees. All of the study
participants gave their written consent. The study protocol, the results of the first
primary endpoint with secondary endpoints, and two subanalyses have already been published
[7]
[8]
[9]
[10].
Study Subjects
All women aged 50 to 69 receive a written invitation every two years to participate
in the German MSP. In the catchment areas of the TOSYMA study sites, in addition to
the regular invitation letter, women also received a personal invitation to take part
in the study, together with the study information. Women who had been diagnosed with
breast cancer up to 5 years previously or who had undergone a mammography within the
past 12 months were not eligible to participate in the MSP. Specific exclusion criteria
for the TOSYMA study included having breast implants or having already previously
participated in the study [7]
[8].
Screening Examination Setup
Participation in the study was offered at 17 screening units in 21 locations (North
West Lower Saxony (Wilhelmshaven), Hannover, North Lower Saxony (Stade), Central Lower
Saxony (Vechta), North East Lower Saxony (Lüneburg), Duisburg, Krefeld/Mönchengladbach/Viersen,
Wuppertal/Solingen (Bergisches Land/Mettmann District), Aachen-Düren-Heinsberg, Cologne
Right Rhine (Bergisch Gladbach), Münster-South/Coesfeld, Bottrop, Gelsenkirchen, Recklinghausen,
Minden-Lübbecke/Herford, Bielefeld/Gütersloh, Hamm/Unna/Märkischer District (Schwerte),
Höxter, Paderborn, Soest (Lippstadt), and Münster North/Warendorf).
Mammography devices from five different manufacturers were used to perform the DBT+SM
or DM examinations: Amulet Innovality (Fujifilm Cooperation, Tokyo, Japan; n = 10,075),
Class Tomo (IMS Giotto, Sasso Marconi, Italy; n = 7,970), Lorad Selenia 3Dimensions
(Hologic, Malborough, US; n = 10,955), Lorad Selenia Dimensions (Hologic, Malborough,
US; n = 40,645), MAMMOMAT Inspiration (Siemens Healthineers, Erlangen, Germany; n = 6,759),
MAMMOMAT Relevation (Siemens Healthineers, Erlangen, Germany; n = 12,917), Senographe
Essential (GE Healthcare, Chicago, US; n = 10,237).
In both study arms, the examination included cranio-caudal and medio-lateral-oblique
projections for each breast. In the test arm, stacked layers of ≤ 1 mm thickness were
reconstructed to create the images for reading (DBT), in addition to the synthesized
two-dimensional mammogram (SM) [7]
[8]
[9].
Independent Double Reading
As in the current MSP, independent double readings were performed by the same certified
physicians in both study arms. The screening study involved a total of 83 experienced
readers who had at least two years of previous screening experience, performing at
least 5,000 screening readings per year. DBT training was provided prior to the start
of the TOSYMA study. There were four to eight readers per study site. They received
their list of study examinations with both study arms mixed in a random order, and
it was not possible to identify the study arm in the screening software before reading.
If there were any abnormalities, the results were discussed at the consensus conference
with the responsible physician of the program so as to decide whether further diagnostics
were indicated. The protocol for further diagnostics after the study examination did
not differ from the established protocol of the MSP; guided by the screening findings,
it included, besides a clinical examination, additional mammogram projections where
appropriate (e. g., magnification mammography or DBT), ultrasound, MRI, or invasive
diagnostic procedures.
All of the screening data were saved in the screening documentation system MaSc (KV-IT
GmbH, Dortmund, Germany) [9].
Study Data
The body of data included all of the results from the double readings; this made it
possible to determine the number and proportion of concordant results (two true-positives)
and discordant results (one true-positive and one false-negative finding) for the
breast cancers detected in each study arm (invasive breast carcinomas and ductal carcinoma
in situ (DCIS)).
A finding was considered true-positive if a subsequently diagnosed carcinoma was presented
at the consensus conference due to at least one mammographic abnormality (category
4a, 4b, and 5), and false-negative if the radiological finding for this carcinoma
did not result in a presentation at the consensus conference (category 1, 2) [4]
[11].
Based on the DM or SM images, breast density was visually assigned to categories A
(fatty), B (fibroglandular), C (heterogeneously dense), or D (extremely dense) [12]. If the two breasts differed in density, the higher category was documented [12]; in the case of discordant density categorization in the independent double reading,
the highest density category was used [9]. A and B were grouped together as non-dense parenchyma, and C and D were grouped
together as dense parenchyma.
The proportion of breast carcinomas detected based on concordant or discordant findings
were stratified according to T categories (Tis, T1, >T1). In the case of multiple
manifestations, the more advanced diagnosis was used, determined by histological size
(pT), or by imaging (cT) in the case of neoadjuvant therapy. Further stratification
included the histological subtype (invasive breast carcinoma of no specific type,
invasive lobular breast carcinoma, other subtypes), the mammographic degree of suspicion
(category 4a: suspicious abnormality, probably benign; 4b: suspicious abnormality,
probably malignant; 5: high suspicion of malignancy), and the mammographic morphology
(mass, microcalcification, architectural distortion, asymmetry, and density) according
to the consensus conference.
Statistical Analysis
The modified full analysis set included 49,762 women from the test arm (DBT+SM) and
49,796 women from the control arm (DM) who received a screening examination after
randomization. The descriptive sub-analysis included all women in whom breast cancer
was detected through screening, comprising 416 women from the test arm, and 306 women
from the control arm ([Fig. 1]). Absolute and relative frequencies were calculated for the categorical variables.
In addition, we calculated the detection rates for single and double true-positive
breast carcinomas per 1,000 women screened.
Fig. 1 Randomized allocation of the TOSYMA trial participants. DBT+SM = digital breast tomosynthesis
plus synthetic mammography; DM = digital mammography.
Results
In the DBT+SM arm, breast cancer was detected in 416 out of 49,762 women (8.4 ‰).
Of these, the diagnosis resulted from discordant radiology findings with only one
true-positive result in 112 women (26.9 %), corresponding to a detection rate of 2.3 ‰
(112/49,762).
At 6.1 ‰ (306/49,796), the breast cancer detection rate in the DM arm was lower than
in the DBT+SM arm, and the proportion of discordant findings was 22.2 %; the resulting
detection rate was 1.4 ‰ (68/49,796) ([Table 1]). Stratification according to non-dense and dense parenchyma showed comparable proportions
of single true-positive breast carcinomas in both study arms (DBT+SM: 29.6 % and 24.7 %
respectively; DM: 20.5 % and 23.8 % respectively).
Table 1
Number (N) and proportion (%) of single and double true-positive detected breast cancers
(invasive and DCIS), based on independent double reading in the DBT+SM and DM trial
arms.
|
Results from the independent double reading
|
DBT+SM
n (%)
|
DBT+SM
A+B
n (%)
|
DBT+SM
C+D
n (%)
|
DM
n (%)
|
DM
A+B
n (%)
|
DM
C+D
n (%)
|
|
Single true-positive
|
112 (26.9 %)
|
56 (29.6 %)
|
56 (24.7 %)
|
68 (22.2 %)
|
30 (20.5 %)
|
38 (23.8 %)
|
|
Double true-positive
|
304 (73.1 %)
|
133 (70.4 %)
|
171 (75.3 %)
|
238 (77.8 %)
|
116 (79.5 %)
|
122 (76.2 %)
|
|
Total (invasive breast carcinoma plus DCIS)
|
416 (100 %)
|
189 (100 %)
|
217 (100 %)
|
306 (100 %)
|
146 (100 %)
|
160 (100 %)
|
DBT+SM: Digital Breast-Tomosynthesis + Synthetic Mammography
DM: Digital Mammography
DCIS: ductal Carcinoma in situ
Visually determined breast density categories A+B (BI-RADS 5th ed. [12]): Non-dense parenchyma
Visually determined breast density categories C+D (BI-RADS 5th ed. [12]): Dense parenchyma
Of the breast carcinomas in the DBT+SM arm that were only detected through a single
true-positive reading, 24.1 % (27/112) had DCIS, 67.9 % had an invasive breast carcinoma
up to 20 mm in size (67/112), and 8 % (9/112) had an invasive breast carcinoma larger
than 20 mm. The corresponding proportions in the DM arm were 32.4 % (22/68), 55.9 %
(38/68), and 11.8 % (8/68) respectively ([Table 2]).
Table 2
Number (n) and proportion (%) of single and double true-positive detected breast cancers
(invasive and DCIS), differentiated according to tumor characteristics and histological
subtype, based on independent double reading in the DBT+SM and DM trial arms.
|
Tumor characteristics
|
DBT+SM
Single true-positive
n (%)
|
DM
Single true-positive
n (%)
|
DBT+SM
Double true-positive
n (%)
|
DM
Double true-positive
n (%)
|
|
pTis
|
27 + 0 (24.1 %)
|
22 + 0 (32.4 %)
|
35 + 0 (11.5 %)
|
44 + 0 (18.5 %)
|
|
pT1 + cT1
|
68 + 8 (67.9 %)
|
34 + 4 (55.9 %)
|
187 + 33 (72.4 %)
|
114 + 30 (60.5 %)
|
|
> pT1 + >cT1
|
8 + 1 (8.0 %)
|
8 + 0 (11.8 %)
|
34 + 15 (16.1 %)
|
38 + 12 (21.0 %)
|
|
No special type
|
56 (65.9 %)
|
31 (67.4 %)
|
210 (78.1 %)
|
157 (80.9 %)
|
|
Lobular subtype
|
20 (23.5 %)
|
12 (26.1 %)
|
42 (15.6 %)
|
29 (14.9 %)
|
|
Other subtypes
|
9 (10.6 %)
|
3 (6.5 %)
|
17 (6.3 %)
|
8 (4.1 %)
|
All histologies are based on the final post-operative evaluation.
pTis: Ductal carcinoma in situ
pT1: Histological tumor size up to 20 mm, > pT1: Histological tumor size greater than
20 mm
cT: In the case of histological confirmation of invasive breast cancer with indication
for neoadjuvant therapy, tumor size was estimated using imaging.
Among the invasive breast carcinomas detected through a single true-positive (discordant
readings) or double true-positive finding (concordant readings), the non special type
was predominant in both study arms. In contrast, the proportion of invasive lobular
carcinomas detected through a single true-positive finding was higher than the proportion
detected through a double true-positive finding (DBT+SM: 23,5 % (20/85) vs. 15,6 %
(42/269), DM: 26.1 % (12/46) vs. 14.9 % (29/194) ([Table 2]).
High suspicion of malignancy (category 5) was rare in both study arms, accounting
for less than 10 % of carcinomas with discordant readings. Here, suspicious changes
of probably benign dignity (category 4a) were predominant, accounting for 67.7 % of
cases (73/112) in the DBT+SM arm and 84.6 % (55/68) in the DM arm ([Table 3]).
Table 3
Number (n) and proportion (%) of single or double true-positive detected breast cancers,
differentiated according to the degree of mammographic suspicion, based on independent
double reading in the DBT+SM and DM trial arms.
|
Finding level at consensus conference
|
DBT+SM
Single true-positive
n (%)
|
DM
Single true-positive
n (%)
|
DBT+SM
Double true-positive
n (%)
|
DM
Double true-positive
n (%)
|
|
4a – Suspicious abnormality, probably benign
|
73 (67.6 %)
|
55 (84.6 %)
|
101 (33.6 %)
|
101 (43.7 %)
|
|
4b – Suspicion abnormality, probably malignant
|
26 (24.1 %)
|
6 (9.2 %)
|
83 (27.6 %)
|
65 (28.1 %)
|
|
5 – High suspicion of malignancy
|
9 (8.3 %)
|
4 (6.2 %)
|
117 (38.9 %)
|
65 (28.1 %)
|
|
Missing data
|
4
|
3
|
3
|
7
|
|
Total (invasive carcinomas plus DCIS)
|
112 (100 %)
|
68 (100 %)
|
304 (100 %)
|
238 (100 %)
|
Mammographic suspicion documented during the consensus conference, based on a single
or double true-positive independent double reading of screening-detected breast cancers
of both trial arms based on the BI-RADS 4th ed. [11]
DCIS: Ductal carcinoma in situ
Among the examinations that only resulted in a single true-positive finding, the proportion
of masses and architectural distortions was higher in the DBT+SM arm than in the DM arm,
while the proportion of microcalcifications was lower ([Table 4, ]
[Fig. 2]).
Table 4
Number (n) and proportion (%) of single or double true-positive detected breast cancers,
differentiated according to mammographic morphology, based on independent double reading
in the DBT+SM and DM trial arms.
|
Morphology at consensus conference
|
DBT+SM
Single true-positive
n (%)
|
DM
Single true-positive
n (%)
|
DBT+SM
Double true-positive
n (%)
|
DM
Double true-positive
n (%)
|
|
Masses
|
29 (26.9 %)
|
13 (20.0 %)
|
115 (38.3 %)
|
106 (45.9 %)
|
|
Microcalcifications
|
26 (24.1 %)
|
24 (36.9 %)
|
50 (16.7 %)
|
51 (22.1 %)
|
|
Architectural distortion
|
23 (21.3 %)
|
7 (10.8 %)
|
29 (9.6 %)
|
11 (4.8 %)
|
|
Asymmetry
|
0 (0.0 %)
|
3 (4.6 %)
|
0 (0.0 %)
|
4 (1.7 %)
|
|
Density
|
0 (0.0 %)
|
5 (7.7 %)
|
2 (0.7 %)
|
9 (3.9 %)
|
|
Combination
|
30 (27.8 %)
|
13 (20.0 %)
|
105 (34.9 %)
|
50 (21.6 %)
|
|
Missing data
|
4
|
3
|
3
|
7
|
|
Total invasive carcinomas plus DCIS
|
112 (100 %)
|
68 (100 %)
|
304 (100 %)
|
238 (100 %)
|
Mammographic morphology documented during the consensus conference, based on a single
or double true-positive independent double reading of screening-detected breast cancers
of both trial arms. DCIS: Ductal carcinoma in situ
Fig. 2 Screening-detected breast cancer. a Single true-positive reading with depiction of an architectural distortion in digital
breast tomosynthesis (cranio-caudal) of the left breast in the lateral quadrants.
Histology: Invasive lobular carcinoma, pT1c (11 mm), pN0, cM0, G2. b Lesion-depicting magnification.
The median reading time for single true-positive readings was 238.0 seconds for DBT+SM
and 121.5 seconds for DM, and for single false-negative readings it was 100.0 seconds
(DBT+SM) and 40.0 seconds (DM). Breast carcinomas detected through a double true-positive
reading had a median reading time of 194.0 seconds in the DBT+SM arm and 99.5 seconds
in the DM arm.
Discussion
The large, multicentric, randomized, controlled TOSYMA study conducted in the context
of the German MSP shows that the independent double readings performed in screening
with both DM (22.2 %) and DBT+SM (26.9 %) resulted in a relevant proportion of carcinomas
being detected based on only a single true-positive reading. Comparable proportions
of discordant findings have already been described in routine mammography screening
programs. Of the screening-detected cancers, 23.6 % were diagnosed in women who were
recalled because of screenings with discordant interpretation [13], and 23 % of breast carcinomas diagnosed through screening were evaluated negatively
by one of the two radiologists [14]. Other reports in the literature also conclude that the double readings can help
to increase the sensitivity of mammography [15]
[16]
[17]. Our results are consistent with results that describe a decrease in sensitivity
for all density categories associated with a single reading of a mammogram compared
to a double reading [18].
In the TOSYMA study, a higher total rate of breast cancer detection by DBT+SM versus
DM also results in a higher rate of breast cancer detection with one true-positive
and one false-negative finding (DBT+SM arm: 2.3 ‰, DM arm: 1.4 ‰). The cancers detected
by a single true-positive reading in the DBT arm include in particular invasive breast
carcinomas up to 20 mm in diameter with a low degree of mammographic suspicion (category
4a). The predominant subtype here is breast carcinoma of no special type, while the
mammographic morphologies vary. Screening aims to detect T1 carcinomas; however, this
can be challenging, even for radiologists experienced in both mammography techniques.
The time taken for the reading could have an influence on breast cancer detection,
as the median reading times for the single false-negative findings are significantly
lower than those of the single true-positive findings, and are slightly lower than
the total median reading time for each study arm [8]. In addition, the single true-positive breast carcinomas may have more subtle abnormalities
than those with double true-positive findings, consistent with a longer median reading
time for each study arm.
This study does not assess sensitivity at the level of the carcinoma lesion; instead,
it is based on the radiological assessment of the screening examination. Since presentation
at the consensus conference is not the same as an indication for a mandatory patient
recall, but also involves, for example, requesting external mammograms or other examination
results, we did not calculate a specificity parameter in relation to the individual
readers. Overall, the recall rate did not differ between the two study arms (DBT+SM:
4.9 %, DM: 5.1 %), while the positive predictive value of recall for further diagnostics
(PPV1) was higher in the test arm than in the control arm (DBT+SM: 17.2 %, DM: 12.3 %)
[8].
Among the single true-positive breast carcinomas, the largest difference in proportions
was observed for microcalcifications, which occurred more frequently in the DM arm
(DM: 36.9 %, DBT+SM: 24.1 %). This is consistent with results from 2D mammography
screening, which show a significantly higher proportion of microcalcifications among
breast carcinomas diagnosed based on discordant readings than those based on concordant
readings [13]. Since the DCIS detection rate did not differ between the two study arms [8] and DCIS detection has a strong association with microcalcifications [19], a true-positive finding of microcalcification with DBT+SM appears to be less dependent
on the independent double reading than is the case with other mammographic morphologies.
In some cases, contrast enhancement of microcalcifications may lead to more obvious
visualization in the test arm than in the control arm [6]. Architectural distortion accounted for the second largest difference in proportions,
with a higher proportion in the DBT+SM arm (DBT+SM: 21.3 %, DM: 10.8 %). The literature
describes the superiority of DBT in detecting spiculations and architectural distortions
[6]. This study shows that the independent double reading has a positive influence on
the frequency of those diagnosis.
Especially in the context of the longer reading times with DBT compared to DM due
to the greater extent of the imaging material with reconstructed layers measuring
1 mm and a median breast compression thickness of 59 mm [8], the prospect of using artificial intelligence (AI) based systems seems promising
as an alternative to performing independent double readings. Implementation of AI
solutions is favored by standardized mammography setting techniques; in the future
it could potentially support human reading, relieving the workload through stratified
preselection. The retrospective AI evaluations conducted in the Malmö and Córdoba
studies show the potential uses of this technology [20]
[21]: Using DBT, the second reading was replaced by AI, resulting in the detection of
95 % of breast carcinomas that were diagnosed through a double-reading process; this
cancer detection rate was 26 % higher than for DM screening with an independent double
reading – but at the expense of increasing the recall rate by 53 %. AI alone in the
DBT arm had a sensitivity comparable to that of the DM arm with double readings [20]. Compared to DBT examinations with an independent double reading, AI could thus
contribute to a relevant reduction in workload without loss of sensitivity [21]. Results from a randomized mammography trial evaluating AI-supported mammography
reading compared to the established double reading support the assumption that a comparable
breast cancer detection rate can be achieved with a much lower workload using AI [22].
The parameter we used, i. e., breast carcinomas detected through a single true-positive
finding, reflects the combined performance of the readers, rather than that of individual
readers. This parameter was measured in the same way for both study arms, within a
randomized study that had a very low potential for bias due to selective choice of
screening participants, readers, or devices. Considering that the DBT arm contained
a not insignificant proportion of breast carcinomas that were detected through a single
true-positive reading, this argues for the fact that a double reading is still necessary
in DBT screening.
TOSYMA is the largest randomized controlled study to date investigating DBT+SM versus
DM screening, comprising almost 100,000 study participants. It allows for complementary
exploratory evaluations based on successful randomization. The pragmatic approach
of this study has a high degree of external validity and also proves its practical
feasibility, due in particular to the involvement of a high number of screening units
and device technologies. Radiographers, readers, and pathologists were trained prior
to the start of the study. All of the physicians were experienced, and the same physicians
read examinations of both study arms, with no differences between the study examinations
and routine screening.
The TOSYMA study has some limitations. It only investigated one round of screening;
this means that the differences between the study arms may have been influenced by
an initial prevalence-screening effect with DBT+SM. In addition, there may be a learning
curve required for reading tomosynthesis images, meaning that the reading time may
decrease with experience. In this sub-analysis, the “true-positive” reading refers
to the level of the examination, not the level of the lesion.
Clinical Relevance
As in digital mammography screening, there is a relevant proportion of breast carcinomas
that are only detected through one true-positive reading out of the two readings;
this applies especially for tumors up to 20 mm in diameter or for lesions that do
not give rise to high suspicion of malignancy. The mandatory independent double reading
still seems necessary, even with DBT screening. In future, this could be a field for
the development of artificial intelligence applications.
Funding
Deutsche Forschungsgemeinschaft (DFG)
HE 1646/5-1, HE 1646/5-2