Volumetric Breast Density Assessment: Reproducibility in Serial Examinations and Comparison with Visual Assessment

J. M. Singh; E. M. Fallenberg; F. Diekmann; D. M. Renz; R. Witlandt; U. Bick; F. Engelken

doi:10.1055/s-0033-1335981

RöFo - Fortschritte auf dem Gebiet der Röntgenstrahlen und der bildgebenden Verfahren, Inhaltsverzeichnis

Rofo 2013; 185(9): 844-848
DOI: 10.1055/s-0033-1335981

Mamma

Volumetric Breast Density Assessment: Reproducibility in Serial Examinations and Comparison with Visual Assessment

Volumetrische Bestimmung der Brustdichte: Reproduzierbarkeit in seriellen Mammografie-Untersuchungen und Vergleich mit visueller Dichtebestimmung

J. M. Singh

¹Department of Radiology, Charité – Universitätsmedizin Berlin, Campus Mitte, Berlin

,

E. M. Fallenberg

¹Department of Radiology, Charité – Universitätsmedizin Berlin, Campus Mitte, Berlin

,

F. Diekmann

²Department of Radiology, Charité – Universitätsmedizin Berlin, Campus Virchow, Berlin

,

D. M. Renz

²Department of Radiology, Charité – Universitätsmedizin Berlin, Campus Virchow, Berlin

,

R. Witlandt

¹Department of Radiology, Charité – Universitätsmedizin Berlin, Campus Mitte, Berlin

,

U. Bick

¹Department of Radiology, Charité – Universitätsmedizin Berlin, Campus Mitte, Berlin

,

F. Engelken

¹Department of Radiology, Charité – Universitätsmedizin Berlin, Campus Mitte, Berlin

› Institutsangaben

Abstract

Volltext

als PDF herunterladen

Key words

mammography - breast cancer - breast density - volumetric measurement

Introduction

Mammographic breast density has been shown to be one of the strongest known markers of breast cancer risk and has been proposed as a variable for individual risk assessment [1] [2] [3] [4]. Some investigators have used breast density as an intermediate end point for interventional studies [5] [6]. An assessment of radiographic breast density is required in every mammography report and is an important variable in research studies. Breast density may in the future become a factor for individualizing breast cancer screening regimens according to each woman’s risk profile and the expected sensitivity of mammography given her individual breast density [7] [8]. A number of different reporting schemes have been developed, with the American College of Radiologists Breast Imaging Reporting and Data System (BI-RADS) perhaps being the most widely used system. However, visual assessment of breast density has limited intra- and interobserver reproducibility [9] [10] [11]. A variety of approaches have been tested to objectify breast density assessment [12] [13] [14] [15]. A drawback of most approaches is the demand on reader time, which limits their use in the clinical setting and population studies alike. Given the importance of breast density for risk stratification, an accurate, fast, and reproducible method for assessing breast density is needed [4] [16]. Volumetric breast density measurement provides an estimate of breast percent density (PD) without reader interaction. The method uses a model of the imaging chain to estimate total breast volume and the amount of glandular tissue present. An important advantage of this approach is that it avoids subjectivity, which is introduced whenever different readers rate breast density on the same study, as this software-based method always produces the same result when presented with identical image input. However, while the algorithm has been calibrated to volume measurements of sample breasts, it is not clear to what extent the produced result deviates from reality. Also, there is no study data on the reproducibility of measurements when there is variation in data input, for example in repeated examinations of the same patient, when differences in projection angle, breast compression and image acquisition parameters may affect the apparent breast density. Given that a possible application of this algorithm is its use in longitudinal studies of breast density, this application requires testing in a sample of consecutive mammograms. This information is necessary for estimating the magnitude of error resulting from variations in the imaging chain and provides a measure of the reproducibility of the process as a whole. The aim of this study was to assess the reproducibility of breast density assessment using the R2 Quantra software in serial mammography examinations and to compare its performance with that of human readers.

Materials and Methods

Patients

We searched our records from June 2002 to December 2006 for patients satisfying the following inclusion criteria: two consecutive examinations performed on the same mammography unit no more than 24 months apart, raw image data stored in the picture archiving and communication system (PACS), unremarkable mammography reports for at least one breast, and minimum of 18 months of normal follow-up of the eligible breast(s). The exclusion criteria were: previous surgery on the eligible breast(s), change in hormone status such as starting or stopping hormone-replacement therapy or menopause, and technical deficits of the mammogram such as inadequate positioning or presence of large skin folds.

A total of 170 patients were identified. Raw image data of the two consecutive mammography examinations were sent to an R2 Cenova^TM server for analysis by the R2 Quantra^TM breast density assessment algorithm. In 29 patients, the algorithm failed to produce results for one or both examinations. These patients were excluded from the analysis. Therefore, 141 patients were included in the study. In 21 patients, the algorithm produced results but marked the results as potentially inaccurate. This occurs when there is a discrepancy between the measurements in the CC and MLO projections. This was recorded for subgroup analysis of the reproducibility.

Only one breast per patient was chosen for analysis to avoid linkage of data points. If both breasts were eligible for analysis, one side was chosen at random. Institutional review board was obtained.

Image acquisition

All patients underwent digital mammography using the same full-field digital mammography system with a flat-panel detector and a cesium iodide absorber, field size 19 × 23 cm, pixel size 100 μm, image matrix size 1914 × 2294 (Senographe 2000 D, General Electric Healthcare, Chalfont St. Giles). All mammograms were acquired in standard craniocaudal and mediolateral oblique projections using automatic optimization of acquisition parameters and standard supplier presets.

Image analysis

For all patients included in the study, breast density was assessed both visually and with the automatic software tool.

Visually, breast density was assessed by three independent, board-certified radiologists of our hospital using the BI-RADS lexicon. Reading was performed on a diagnostic mammography workstation (syngo MammoReport, Siemens Medical, Erlangen, Germany) in a blinded manner without knowledge of the woman’s age, the original mammography interpretation, and risk profile for breast cancer. The three observers independently assessed the mammograms for breast density, assigning one of the BI-RADS breast density categories on a standardized form. The first mammogram of each patient was read first, followed by another reading session for the second mammogram after an interval of 4 weeks or more. In a third reading session, again after an interval of at least 4 weeks, the first set of mammograms was read a second time to estimate the intra-rater reproducibility.

The BI-RADS scheme of breast densities, developed by the American College of Radiology (ACR) is intended to provide a standardized classification system for mammographic studies. The ACR classification identifies four categories of breast composition: (1) the breast is almost entirely fat (< 25 % glandular); (2) there are scattered fibroglandular densities (25 – 50 % glandular); (3) the breast tissue is heterogeneously dense (approximately 51 – 75 % glandular); and (4) the breast tissue is extremely dense (> 75 % glandular).

For the software-based analysis, raw image data were sent to a dedicated server running the R2 Quantra software. Briefly, R2 Quantra™ is a software tool for automatically calculating volumetric breast density from the ratio of fibroglandular tissue to the estimated total breast volume. The algorithm uses a physical model of the imaging process to deduce the density and composition of breast tissue from the degree of X-ray attenuation on mammograms. To achieve this, the algorithm estimates the amount of fibroglandular tissue an X-ray beam must have passed to deposit the amount of energy measured at the detector. Images are processed within minutes. The output of the R2 Quantra software includes the estimated total breast volume and fibroglandular tissue volume in ml (cm³) and the calculated breast PD ([Fig. 1]).

Fig. 1 Representative mammogram of the right breast in craniocaudal and mediolateral oblique projection and the corresponding datasheet provided by the QuantraR2 software.

Abb. 1 Repräsentative Mammografie-Aufnahmen der rechten Brust in craniocaudaler und mediolateral-obliquer Ausrichtung sowie das korrespondierende Datenblatt der QuantraR2-Software.

Statistical analysis

Data analysis was performed using statistical software packages (SPSS, version 18.0; SPSS Chicago, Illinois; MedCalc 12.3.0). The intra- and inter-rater reproducibility as well as the inter-examination reproducibility of the visual and software-based analysis were assessed by calculating the intraclass correlation coefficient (ICC). For comparison with other studies of visual density assessment, quadratic-weighted kappa values were also calculated for the intra- and inter-rater reproducibility. For the correlation of categorical BI-RADS density levels of examinations 1 and 2 versus ordinal volumetric breast density values, BI-RADS classes 1 – 4 were replaced with the mean PD value of the respective category (1 = 12.5 %; 2 = 37.5 %; 3 = 62.5 %; 4 = 87.5 %), and the ICC was calculated.

To investigate the effects of different compression forces on breast density estimates by volumetric assessment, we assigned the patients to one of four subgroups based on the magnitude of the difference in compression force applied for the first and the second mammogram in each patient. For each subgroup, the inter-examination agreement of the measured breast density was determined. Differences in correlation coefficients were tested for statistical significance using the Fisher r-to-z transformation.

Results

The patients had a mean age of 62 years (range, 45 – 78 years). 61 patients underwent mammography in the setting of surveillance after breast surgery and had one unaffected breast. The remaining 80 patients had workup of a palpable lump or unclear ultrasound findings. The median interval between the first and the second examination was 13.2 months with a range of 9 – 24 months. 29 patients were premenopausal, 112 patients were postmenopausal. Of the premenopausal patients, 6 patients took oral contraceptive agents. Of the postmenopausal patients, 12 received hormone replacement therapy and 17 received antihormonal therapy.

The results for inter-rater agreement in visual breast density assessment between pairs of observers for both examinations, 1 and 2, are summarized in [Table 1]. The inter-rater agreement ranged from 0.71 – 0.77 (ICC).

Table 1
Inter-rater variability. Intraclass correlation coefficients (ICC) and quadratic-weighted kappa values were calculated. Numbers in parentheses represent 95 % confidence intervals.
Tab. 1 Interrater-Variabilität. Es wurden die Intraklassen-Korrelationskoeffizienten (ICC) und quadratisch-gewichteten Kappa-Koeffizienten (κ) bestimmt. Zahlen in Klammern stellen die jeweiligen 95 % Konfidenzintervalle dar.
		examination 1	examination 2
rater A vs. B
	ICC	0.71 (0.62 – 0.78)	0.74 (0.65 – 0.80)
	κ	0.69 (0.61 – 0.77)	0.73 (0.66 – 0.81)
rater A vs. C
	ICC	0.77 (0.71 – 0.86)	0.74 (0.66 – 0.81)
	κ	0.76 (0.79 – 0.82)	0.75 (0.66 – 0.83)
rater B vs. C
	ICC	0.77 (0.69 – 0.83)	0.76 (0.68 – 0.82)
	κ	0.69 (0.58 – 0.78)	0.72 (0.62 – 0.82)

[Table 2] summarizes the results for intra-rater agreement for examination 1, the inter-examination variability for raters and volumetric measurements, as well as the comparison between visual breast density assessment and volumetric analysis. The intra-rater agreement ranged from 0.81 – 0.84 (ICC). The inter-examination agreement of examinations 1 and 2 for individual readers varied from 0.75 – 0.81 versus 0.91 for volumetric analysis. The difference in the strength of correlation between volumetric and visual assessment was statistically significant for all readers and constellations (p≤ 0.01). In patients where breast density was marked as potentially inaccurate by the R2 Quantra software, the inter-examination agreement was 0.90 (95 % confidence intervals, 0.77 – 0.96).

Table 2
Intra-rater agreement, agreement of R2 Quantra and visual assessment, and inter-examination agreement for visual and software-based breast density assessment. Numbers in parentheses represent 95 % confidence intervals.
Tab 2 Intrarater-Übereinstimmung, Übereinstimmung von R2 Quantra zu visueller Bestimmung sowie Interexamination-Übereinstimmung für visuelle und Software-basierte Brustdichtebestimmung. Zahlen in Klammern repräsentieren die 95 % Konfidenzintervalle.
		rater A	rater B	rater C	quantra PD
intra-rater agreement
	ICC	0.83 (0.78 – 0.88)	0.81 (0.74 – 0.86)	0.84 (0.77 – 0.88)	/
	Κ	0.81 (0.75 – 0.87)	0.80 (0.72 – 0.87)	0.82 (0.75 – 0.89)	/
agreement quantra vs. visual assessment
examination 1	ICC	0.68 (0.58 – 0.76)	0.68 (0.58 – 0.76)	0.65 (0.55 – 0.74)	/
examination 2	ICC	0.69 (0.59 – 0.77)	0.63 (0.51 – 0.83)	0.73 (0.64 – 0.80)	/
inter-examination agreement
	ICC	0.75 (0.67 – 0.83)	0.81 (0.74 – 0.86)	0.76 (0.67 – 0.84)	0.91[*] (0.87 – 0.93)

^* indicates statistical significance of the difference in ICC compared with all other ICC values (p≤ 0.01).

^* Zeigt einen statistisch signifikanten Unterschied des ICC-Wertes im Vergleich zu allen anderen ICC-Werten an (p≤0.01).

[Table 3] shows the inter-examination correlation of the volumetric analysis of the whole group and for the four subgroups based on magnitude of difference in compression forces. The inter-examination correlation of the volumetric analysis was similar in all groups, regardless of the differences in mean compression forces.

Table 3
Inter-examination reproducibility of software-based analysis by magnitude of difference in compression forces between the two mammography examinations. Numbers in parentheses represent 95 % confidence intervals.
Tab. 3 Interexamination-Reproduzierbarkeit der Software-basierten Analyse in Abhängigkeit vom Ausmaß der Kompressionskraftschwankungen zwischen den zwei Mammografieuntersuchungen. Zahlen in Klammern repräsentieren die 95 % Konfidenzintervalle.
difference in compression force	N (%)	inter-examination reproducibility (ICC)
0 – 39 N	49 (35 %)	0.89 (0.82 – 0.94)
40 – 79 N	51 (36 %)	0.92 (0.86 – 0.95)
80 – 119 N	27 (19 %)	0.92 (0.83 – 0.96)
≥ 120 N	14 (10 %)	0.91 (0.73 – 0.97)
total	141 (100 %)	0.90 (0.87 – 0.93)

Discussion

The aim of our study was to assess the reproducibility of breast density measurement in consecutive examinations using volumetric breast density analysis software and to compare the results with the performance of human readers.

We found substantial, but not excellent, intra- and interobserver reproducibility of the visual density classification, comparable to the results reported by other studies. The inter-examination reproducibility of visual assessment was equal to or slightly less than the intra-examination reproducibility, depending on the reader. In comparison, breast density measurement by volumetric analysis showed an excellent inter-examination reproducibility, which was significantly higher than that of human readers. There was good agreement of the readers’ results with the volumetric analysis. We found no influence of differences in breast compression on the reproducibility of volumetric breast density analysis. Results that were marked as discrepant in CC and MLO views and therefore potentially inaccurate by the software were as reproducible as results that were not marked as potentially inaccurate.

Breast density has been shown to be the strongest known risk factor for breast cancer [1] [2] [3] [4] [17]. There is some evidence that breast density may reflect changes in breast cancer risk associated with interventions such as tamoxifen treatment [18]. From a clinical perspective, breast density has a strong effect on mammographic sensitivity [19] [20]. Future breast cancer screening programs may employ individualized screening regimens for women according to their personal breast cancer risk as well as their chance of benefiting from additional procedures like breast ultrasound or digital breast tomosynthesis [21] [22]. Therefore, accurate and reproducible measurement of breast density is very desirable both in the clinical and research setting. The results of our study show that volumetric analysis provides highly reproducible measurements of breast density in consecutive examinations and clearly exceeds the performance of human readers. The method appears to be robust with respect to differences in breast compression as well as the small differences in breast orientation and projection angle, which may occur in consecutive examinations. Volumetric analysis is therefore preferable to visual assessment in the setting of longitudinal studies of breast density.

Most studies investigating the reproducibility of breast density assessment have looked at intra- and inter-rater reproducibility. Software-based volumetric analysis always yields the same result when confronted with the same mammogram, thereby eliminating intra- and interobserver variability. As immediate acquisition of a second mammogram after a satisfactory mammogram has been obtained is not possible for ethical reasons, we used serial mammograms for estimating the reproducibility of the method. The reproducibility of visual breast density assessment has been shown to be substantial but not perfect [9] [10] [11]. Interactive thresholding in one study of digitized film mammograms improved both the inter- and intra-rater reproducibility, with an increase in the intraclass coefficients to 0.84 – 0.94 and 0.93 – 0.99, respectively.[12] Another study showed better correlation of the Cumulus method with another automated density assessment algorithm than with the four-category BI-RADS scale on digitized mammograms [13]. However, ours is the first study to investigate the reproducibility of breast density assessment in serial examinations.

Three-dimensional imaging techniques, such as MR volumetry and digital breast tomosynthesis, may yield similar information and potentially provide more accurate volume measurements. However, the strength of quantifying breast tissue density from digital mammograms is that these are inexpensive and widely available. A current limitation of the software is the failure rate of around 8.5 % observed in this study, which may be improved with future developments.

The results of our study are relevant both to the use of this method in longitudinal studies and to the comparison of results obtained in different imaging centers, where variations in imaging technique cannot be fully avoided. The lack of reader interaction and the avoidance of intra-rater variability represent notable advantages over alternative breast density assessment approaches. It should be noted that the high reproducibility (precision) of this method does not allow assumptions about its accuracy, i. e. the closeness of the software result to the true breast composition. While a highly accurate measurement would be highly reproducible, high reproducibility does not prove high accuracy. However, the high reproducibility of this algorithm means that changes in breast density over time will be detected with much higher precision by volumetric assessment than by visual assessment.

The major limitation of our study is the long interval between consecutive mammography examinations in the same patients. While 1 – 2 years is the minimum interval for performing serial mammography after an initial unremarkable mammogram, this is long enough for changes in weight to occur and changes in hormone levels to manifest. The reproducibility found in this study, therefore, very likely represents an underestimate.

In conclusion, volumetric breast density measurement is highly reproducible in serial mammograms in a routine clinical setting. The performance significantly exceeds the reproducibility of visual assessment by human readers. The method appears robust with respect to variations in breast compression. Given the lack of reader interaction and the avoidance of intra- and inter-rater variability, this method is a useful tool for longitudinal studies of breast density and for the quantification of breast density for breast cancer risk stratification.

Referenzen

References
1 Boyd NF, Guo H, Martin LJ et al. Mammographic density and the risk and detection of breast cancer. N Engl J Med 2007; 356: 227-236
2 Duffy SW, Nagtegaal ID, Astley SM et al. Visually assessed breast density, breast cancer risk and the importance of the craniocaudal view. Breast Cancer Res 2008; 10: R64
3 McCormack VA, dos Santos SI. Breast density and parenchymal patterns as markers of breast cancer risk: a meta-analysis. Cancer Epidemiol Biomarkers Prev 2006; 15: 1159-1169
4 Assi V, Warwick J, Cuzick J et al. Clinical and epidemiological issues in mammographic density. Nat Rev Clin Oncol 2012; 9: 33-40
5 Cuzick J, Warwick J, Pinney E et al. Tamoxifen and breast density in women at increased risk of breast cancer. J Natl Cancer Inst 2004; 96: 621-628
6 Weitzel JN, Buys SS, Sherman WH et al. Reduced mammographic density with use of a gonadotropin-releasing hormone agonist-based chemoprevention regimen in BRCA1 carriers. Clin Cancer Res 2007; 13: 654-658
7 Lee CH, Dershaw DD, Kopans D et al. Breast cancer screening with imaging: recommendations from the Society of Breast Imaging and the ACR on the use of mammography, breast MRI, breast ultrasound, and other technologies for the detection of clinically occult breast cancer. J Am Coll Radiol 2010; 7: 18-27
8 Hofvind S, Skaane P. Stage distribution of breast cancer diagnosed before and after implementation of population-based mammographic screening. Fortschr Röntgenstr 2012; 184: 437-442
9 Ciatto S, Houssami N, Apruzzese A et al. Categorizing breast mammographic density: intra- and interobserver reproducibility of BI-RADS density categories. Breast 2005; 14: 269-275
10 Ooms EA, Zonderland HM, Eijkemans MJ et al. Mammography: interobserver variability in breast density assessment. Breast 2007; 16: 568-576
11 Perez-Gomez B, Ruiz F, Martinez I et al. Women's features and inter-/intra-rater agreement on mammographic density assessment in full-field digital mammograms (DDM-SPAIN). Breast Cancer Res Treat 2012; 132: 287-295
12 Byng JW, Boyd NF, Fishell E et al. The quantitative analysis of mammographic densities. Phys Med Biol 1994; 39: 1629-1638
13 Heine JJ, Carston MJ, Scott CG et al. An automated approach for estimation of breast density. Cancer Epidemiol Biomarkers Prev 2008; 17: 3090-3097
14 Ciatto S, Bernardi D, Calabrese M et al. A first evaluation of breast radiological density assessment by QUANTRA software as compared to visual classification. Breast 2012;
15 van Engeland S, Snoeren PR, Huisman H et al. Volumetric breast density estimation from full-field digital mammograms. IEEE Trans Med Imaging 2006; 25: 273-282
16 Panner J, Schuetz GM, Hamm B et al. A systematic guide for reading and interpreting diagnostic accuracy studies. Fortschr Röntgenstr 2011; 183: 909-912
17 Chiu SY, Duffy S, Yen AM et al. Effect of baseline breast density on breast cancer incidence, stage, mortality, and screening parameters: 25-year follow-up of a Swedish mammographic screening. Cancer Epidemiol Biomarkers Prev 2010; 19: 1219-1228
18 Cuzick J, Warwick J, Pinney E et al. Tamoxifen-induced reduction in mammographic density and breast cancer risk reduction: a nested case-control study. J Natl Cancer Inst 2011; 103: 744-752
19 Bird RE, Wallace TW, Yankaskas BC. Analysis of cancers missed at screening mammography. Radiology 1992; 184: 613-617
20 Harvey JA, Fajardo LL, Innis CA. Previous mammograms in patients with impalpable breast carcinoma: retrospective vs blinded interpretation. 1993 ARRS President's Award. Am J Roentgenol Am J Roentgenol 1993; 161: 1167-1172
21 Schousboe JT, Kerlikowske K, Loh A et al. Personalizing mammography by breast density and other risk factors for breast cancer: analysis of health benefits and cost-effectiveness. Ann Intern Med 2011; 155: 10-20
22 Olgar T, Kahn T, Gosch D. Average glandular dose in digital mammography and breast tomosynthesis. Fortschr Röntgenstr 2012; 184: 911-918

Abbildungen

Fig. 1 Representative mammogram of the right breast in craniocaudal and mediolateral oblique projection and the corresponding datasheet provided by the QuantraR2 software.

Abb. 1 Repräsentative Mammografie-Aufnahmen der rechten Brust in craniocaudaler und mediolateral-obliquer Ausrichtung sowie das korrespondierende Datenblatt der QuantraR2-Software.