Key words visual perception - radiologists - health care quality - computerized tomography -
stroke - acute
Introduction
One of the essential skills of a radiologist is to identify imaging patterns and their
corresponding pathologies. The expertise of radiologists grows as their professional
experience broadens – not only due to increased specialist knowledge, but also due
to continuously improving proficiency in pattern recognition. While specialist knowledge
can be expanded as a result of clinical practice and appropriate reading matter, the
processes that help to continuously improve such pattern recognition have not yet
been fully understood. The higher speed at which findings are identified by an experienced
radiologist is probably attributable to his familiarity with potential patterns, permitting
a scan to be reviewed much faster. The lack of experience of younger radiologists
frequently leads to less focused approaches in respect to differentiating major from
minor diagnostic findings. Major findings predominantly included vessel occlusions
of brain supplying arteries, one cerebral sinus venous thrombosis and one randomly
depicted pulmonary embolism. Minor diagnostic findings included vascular aberrations,
preexisting postischemic defects or pulmonary findings, such as pulmonary nodules,
infiltrates or a pneumothorax and a struma. The precise extent to which the approach
of experienced versus less experienced radiologists differs remains unclear. Recommendations
on best practices for examining radiographic images as well as for optimizing the
detection of findings could be defined based on such deliberations.
Eye tracking is a method that is already widespread in miscellaneous research fields,
especially economics, and has been used, for example, in studies for analyzing marketing
campaigns [1 ]
[2 ]
[3 ]
[4 ]. The method automatically tracks the movement of the eyeball and position of the
pupil while an image or screen is being viewed. Furthermore, it records the duration
that the eye focuses on the relevant areas of the image, without disturbing the assessor.
This method can therefore be used to determine which areas of an image were viewed
and for how long, as well as which areas were potentially completely ignored [5 ].
To potentially provide younger colleagues with specific training and optimize observation
patterns, awareness of efficient techniques for examining scans is of high importance
[6 ]. Thus, the aim of this study was to record observation patterns of radiologists
when reviewing stroke CTs, depending on their professional experience, and to evaluate
the detection/recognition of findings. In addition, the influence of right or wrong
clinical information on the reader’s focus was observed.
Materials and Methods
Subjects
A total of 12 radiologists, with 6 of them working as residents (1 – 5 years of working
experience) and 6 as radiologic specialists (3 were general radiologic senior physicians,
the other 3 were additionally trained as neuroradiologic specialists), were included
for the analysis of image data sets. The average length of working experience was
8 years.
For evaluation, the radiologic assessors used a regular workstation prepared with
an eye tracker on the bottom of the regular radiologic computer screen. The blinded
CT data sets were presented to the assessors through a study supervisor. These assessors
were asked to analyze images according to their normal working pattern. Meanwhile,
their eye movements were recorded with the eye tracker.
In terms of potential interference to eye tracking records, 4 participants had normal
vision or contact lenses and 8 of them had corrected-to-normal vision via glasses.
Each reader had to analyze 20 blinded and randomly presented stroke CT data sets including
non-enhanced CT (NECT), CT angiography (CTA) and CT perfusion (CTP). Diagnostic CT
imaging examinations were indicated by suspected stroke. In order to simulate comparable
clinical reporting conditions in the best manner, clinical information was given prior
to image analysis and the time span per image analysis was limited to a maximum of
10 minutes. To avoid artifacts during fixation measurements and to precisely measure
the time needed for appraisal, pathologies were recorded by the study supervisor through
a think aloud concept.
Measurement of the diagnostic accuracy
The image diagnoses used as the basis for the treatment were used as the "gold standard"
of the available findings in order to subsequently evaluate how many of these findings
were actually found by the assessors and to compare the diagnostic accuracy of experienced
and less experienced assessors.
Measurement of duration of image reading
The maximum time span permitted for image analysis was restricted to 10 minutes. To
simulate the clinical setting of CT image reading, the gaze estimation methods are
based on real-time image processing videos recording the reader’s fixation. This allows
retrospective analysis of the time needed by the radiologists to read data sets.
Order of evaluation of orientations and reconstructions
Additionally, by analyzing these real-time image processing videos, the order of the
image evaluation preferred by the individual assessors for diagnosis was analyzed
retrospectively.
Regions of interest
Furthermore, video analysis allows for investigation of regions of interest (ROIs)
that were defined manually for all patients by encircling every pathologic area in
every cross-sectional view. For 6 randomly chosen patients, the proportionate time
of focusing the ROIs was measured, as well as the time until the assessor’s first
fixation of an ROI, depending on the examiner’s working experience.
Institutional review board approval was granted to retrospectively analyze anonymized
CT data using eye tracking and to record the data of the radiologic assessor.
Eye tracking device
The utilized device (Tobii X2–60 Eye Tracker, Tobii Technology AB, Danderyd, Sweden)
was based on corneal reflection recorded unobtrusively by a tracker located below
the screen. Prior to the recording, a calibration was performed for each participant
to correlate the fixation area on the screen to the coordinates. The distance between
the participant and the monitor was kept constant at 65 cm by means of a mark on the
floor. Gaze data were recorded for both eyes with a sampling rate of 60 Hz. Eye gaze
coordinates were acquired every 16.7 ms. The fixation areas and durations were determined
using a predefined computerized classification algorithm, working with overlays. Consequently,
heat maps of the assessor’s fixation areas, scan paths and the fixation of dedicated
ROIs were generated. Due to the volumetric imaging, ROIs had to be defined manually
for every cross-sectional view containing a pathology.
MSCT acquisition
The examinations were performed using a 256-slice CT scanner (Somatom® Definition FLASH, Siemens AG, Medical Solutions, Forchheim, Germany). For i. v. contrast
administration (Ultravist® -370, Bayer Schering Pharma, Leverkusen, Germany), according to the local standardized
scanning protocols, 40–60 mL are injected followed by a 25-mL saline bolus chaser,
using a constant injection rate of > 4 mL/s.
Images were obtained at a tube voltage of 120 kV and 200 mAs, using a special dose-modulation
template (CARE dose4DTM, Siemens AG, Medical Solutions, Forchheim, Germany) to reduce
the radiation exposure [7 ]. Data sets were reconstructed at a slice thickness of 1.5 mm with a reconstruction
increment of 0.6 mm. For CTP, dynamic scanning was performed at a 66-second biphasic
cine series using the following standardized parameters: 80 kVp, 200 mA, 1-second
rotation time, 1 image per second for 40 seconds with 9 additional images at a rate
of one every third second, as used in the clinical routine [8 ]. For evaluation, different color-coded maps (cerebral blood volume (CBV), cerebral
blood flow (CBF), mean transit time (MTT), and time-to-peak (TTP)) were generated.
Statistical analysis
Statistical analysis was performed using SAS (Version 9.3 for Windows, SAS Institute
Inc., Cary, NC, USA). Inferential statistics were intended to be exploratory (generating
hypotheses) instead of confirmatory and were interpreted accordingly. Hence, p-values
were interpreted according to Fisher, by assessing the metric weight of evidence against
the respective null hypothesis of no effect. P-values were considered significant
if ≤ 0.05 and highly significant if ≤ 0.01. Standard descriptive statistical analyses
were performed.
Categorical variables are described as absolute and relative frequencies.
Results
For every required case, reconstructions of acquired CT data were technically feasible
and assessable. Neuroradiologic reports of diagnostic findings, which were the basis
for the treatment, were defined as the “gold standard” of pathologic findings. Objects
to be detected are categorized as main findings causing the neurologic symptoms, e. g.
early signs of cerebral ischemia, acute arterial occlusions, dissection of the cervical
arteries, cerebral venous sinus thrombosis or findings suspicious for tumor, and as
secondary findings without relevance with respect of the suspected stroke, including
preexisting cerebral infarctions or preexisting vessel pathologies like aneurysms
or atherosclerosis, cervical pathologies that are recorded in the cervical angiography
of brain-supplying vessels, like struma or in one case a cervical hematoma, or thoracic
pathologies containing pulmonary diseases like pneumothorax, pulmonary infiltrates
or pleural effusions, a cardiomegaly in one patient and subcutaneous emphysema.
Overall and itemized diagnostic accuracy
Overall and itemized diagnostic concordance with consensus with respect to main or
secondary findings and considering the working experience are detailed in [Table 1 ]. Secondary or incidental findings in this context are all findings that have no
connection to the symptoms or primary disease (examples are aneurysms, pleural effusions,
struma nodosa, etc.). Diagnostic concordance varied in relation to professional experience,
while no relevant differences in the detection rates of the main findings as a function
of working experience could be observed (residents detected 203 of 252 pathologies
([Table 1 ]) and senior physicians observed 196 of 249 pathologies ([Table 1 ]; the difference results from the fact that a participant did not evaluate two data
sets). Furthermore, the general radiologic specialists observed 102 of 126 pathologies
and the neuroradiologic specialists detected 94 of 123 findings ([Table 1 ]; again, the difference can be explained by the lack of interpretation of two data
sets by a test person). However, with respect to secondary findings, greater differences
between experienced and less experienced assessors were observed. Of the 282 secondary
findings, the resident physicians noticed 158 (0.56 %; [Table 1 ]), while the senior physicians only noticed 122 of 278 (0.44 %; [Table 1 ]; again, the difference can be explained by the lack of interpretation of two data
sets by a test person) secondary findings. With regard to their professional focus,
general radiologic specialists observed 60 of 141 secondary findings and neuroradiologic
specialists mentioned 62 of 137 secondary pathologies ([Table 1 ]; again, the difference can be explained by the lack of interpretation of two data
sets by a test person).
Table 1
Detection rates.
Tab. 1 Detektionsraten.
detection rates in relation to available findings
pathologies identified
a: residents
b: senior physicians
c: general radiologists
d: neuro-radiologists
overall pathologies
0.68
0.60
0.61
0.60
main findings
0.81
0.79
0.81
0.76
secondary findings
0.56
0.44
0.43
0.45
Listed above is the ratio of the findings detected from all available findings (overall
pathologies), additionally subdivided into main and secondary findings. The above
(in b) depicted ratio of the findings detected from all available findings (overall
pathologies) is specified as senior physicians with (d: neuroradiologists) or without
a neuroradiological focus (c: general radiologists).
Overall and itemized duration of image reading
The overall time needed to read all available images showed a highly significant difference
between resident physicians and senior physicians, as shown in [Table 2, ]
[Fig. 1 ]. The average time resident physicians spent for reading the available images was
5:41.5 minutes:seconds, whereas senior physicians averaged 4:12 minutes:seconds (neuroradiologic
specialists needed 4:6.7 minutes:seconds and general radiologists needed an average
time of 4:17.1 minutes:seconds to read the available radiologic images). A more detailed
consideration is presented in the three right-hand columns in [Table 2 ]. However, it reveals no significant differences between the proportionate times
needed in regard to the computer tomographic angiographies with amounts of 58 % of
the overall inspection times by residents and 53 % by senior physicians (52 % for
neuroradiologic specialists and 53 % for general radiologic specialists). By considering
the proportionate times of reading different orientations of the CT angiographies,
transverse image orientations are mostly used by all groups. Nevertheless, additional
viewing of coronal orientations of CT angiographies could be observed slightly more
often in neuroradiologists, compared to less specialized readers ([Table 2 ]).
Table 2
Reading times.
Tab. 2 Befundungsdauer.
time to read available images in seconds
average time per data set
standard deviation
reading time in transverse orientation
reading time in sagittal orientation
reading time in coronal orientation
all assessors
297.1
113.2
0.48
0.03
0.04
resident physicians
341.5
109.8
0.51
0.04
0.03
senior physicians
252.0
98.2
0.45
0.02
0.05
general radiologic senior physicians
246.7
98.2
0.42
0.03
0.07
neuroradiologic senior physicians
257.1
98.7
0.48
0.01
0.04
The evaluation in the two columns on the left was based on the average time required
for analyzing a case in seconds by all radiologists and according to the subgroups
of assistant doctors and senior physicians (with and without neuroradiological focus).
The three columns on the right show the percentage of time required to analyze the
different section orientations (transverse, sagittal and coronal) of the total observation
time.
Fig. 1 Average diagnosis duration per case for all assessors. Above shown is the average
total diagnostic time in milliseconds per case for all subjects (1), assistant doctors
(2), senior physicians (3), senior physicians with (4) and without a neuroradiological
focus (5). A significant difference (marked by ****) between assistant and senior
physicians (p < 0.0001) is shown, with both values being within the standard deviation
of the respective other value. There was no significant difference between the subgroups
of senior physicians.
Abb. 1 Durchschnittliche Diagnosedauer pro Fall für alle Befunder. Oben dargestellt ist
die durchschnittliche Gesamtdiagnostikzeit in Millisekunden pro Fall für alle Probanden
(1), Assistenzärzte (2), Oberärzte (3), Oberärzte mit (4) und ohne neuroradiologischen
Schwerpunkt (5). Ein signifikanter Unterschied (markiert mit ****) zwischen Assistenzärzten
und Oberärzten (p < 0,0001) wird angezeigt, wobei beide Werte innerhalb der Standardabweichung
des jeweils anderen Wertes liegen. Es gab keinen signifikanten Unterschied zwischen
den Untergruppen der Oberärzte.
Order of evaluation of orientations and reconstructions
By means of “screen recordings”, the order of evaluating the images the individual
assessors preferred for their image reporting could retrospectively be analyzed. Mostly,
probands tended to start reporting by inspecting the brain in NECT before altering
to the bone window setting, the CT perfusion or CT angiography in different orientations.
Less experienced assessors more often changed between the different reconstructions
and image series with an average number of alternations of 8.1 (43.1 in relation to
total inspection time), compared to 6.2 average alternations of senior physicians,
resulting in 40.6 in relation to their total inspection time (6.1 with and 6.4 without
neuroradiologic specialization).
Regions of interest
The final investigation included the separate evaluation of the main pathologies in
6 cases. The observed pathologies included a thrombosis of the basilar artery, an
occlusion of the internal carotid artery, a double-sided occlusion of the MCA, a dissection
of the aorta, an occlusion of ipsilateral ICA and MCA and an occlusion of vertebral
and basilar artery. For all patients, regions of interest (ROIs) were defined manually
by encircling the area around the pathology in every cross-sectional view containing
it ([Fig. 2 ]). For recording and evaluation of the recorded data, the Studio Capture software
(Tobii Technology) was used. The proportionate time of focusing the ROIs was measured
and a quotient to the total inspection time was calculated ([Table 3 ]). Furthermore, we evaluated the duration until the reader’s first fixation of the
ROI, depending on the working experience ([Table 4 ]). While no significant differences of the relative time period to focus the region
of interest between the individual groups based on experience could be observed, less
time is needed regarding the time duration until first focusing of the ROI particularly
by neuroradiologic specialists to detect the predominantly neurologic pathologies.
Neuroradiologic specialists detected the pathology most rapidly (average 1:28.2 minutes:seconds;
[Table 4 ]). However, general radiologic specialists on average required 1:54.7 minutes:seconds
and residents 1:54.8 minutes:seconds until first fixation of the ROI (54). Nonetheless,
these results do not achieve the predefined level of significance (p = 0.06).
Fig. 2 ROI labeling. In the present CT angiography scan, a thrombus within the lumen of
the basilar artery can be seen (upper row); in the lower row this pathology is labelled
with an ROI in all 3 planes (examplarily in one slice each, whereby this labeling
must be carried out in all slices of all planes in which the pathology can be seen)
in order to analyze visual fixation of the ROI.
Abb. 2 Beschriftung ROI. In der vorliegenden CT-Angiografie ist ein Thrombus im Lumen der
Basilar-Arterie zu sehen (obere Reihe); in der unteren Reihe ist diese Pathologie
in allen 3 Ebenen mit einer ROI markiert (beispielhaft in einer Schicht, wobei diese
Markierung in allen Schichten aller Ebenen, in denen die Pathologie zu sehen ist,
durchgeführt werden muss), um die visuelle Fixierung des ROI zu analysieren.
Table 3
Duration of the fixation of the ROI.
Tab. 3 Dauer der Fixation einer „ROI“.
duration of the fixation of the ROI
ratio of duration of the fixation to total inspection time
all assessors
0.128
resident physicians
0.127
senior physicians
0.128
general radiologic senior physicians
0.138
neuroradiologic senior physicians
0.119
In 6 patients out of the 20 patients whose CT data sets were included, ROI-specific
evaluations were also generated to specifically analyze the duration of the fixation
of the diagnostic findings marked with ROI (in seconds), pro rata to the duration
of the overall inspection time.
Table 4
Time until first fixation of the ROI.
Tab. 4 Zeit bis zur ersten Fixierung einer „ROI“.
time until first fixation of the ROI
time until first fixation of the ROI (median in seconds)
all assessors
108.6
resident physicians
114.8
senior physicians
93.9
general radiologic senior physicians
114.7
neuroradiologic senior physicians
88.2
Additionally, in these 6 patients out of the 20 patients whose CT data sets were included,
ROI-specific evaluations were generated in order to specifically analyze the time
until the first fixation of the ROI-labelled findings (in seconds), compared to the
reference time (therefore the average duration of senior physicians in seconds is
set as the reference).
Comparing the time period utilized to inspect the different contrast phases (non-contrast
CT, CT angiography and CT perfusion), the ratios indicate that neuroradiologic specialists
prefer to survey the pathology on non-enhanced CTs (42 % of total time used to survey
the ROI in NECT versus 32 % on CT perfusion and 25 % on CT images; [Table 5 ]), whereas the residents, like general radiologists, prefer the observation of the
pathologies on CTA images (general radiologists prefer the observation on CTA with
51 % of total time versus 30 % on CT perfusion and 19 % on NECT; residents prefer
the survey on CTA with 47 % of total time compared to 21 % on CT perfusion and 32 %
on NECT; [Table 5 ]).
Table 5
Duration of the fixation of the ROI in different contrast administrations.
Tab. 5 Dauer der Fixation einer „ROI“ in den verschiedenen Kontrastierungsphasen.
duration of the fixation of the ROI itemized to different contrast administrations
in relation to total inspection time
fixation of the ROI in CT perfusion
fixation of the ROI in non-contrast CT
fixation of the ROI in CT angiography
all assessors
0.26
0.31
0.43
resident physicians
0.21
0.32
0.47
senior physicians
0.31
0.31
0.38
general radiologic senior physicians
0.30
0.19
0.51
neuroradiologic senior physicians
0.32
0.42
0.25
Moreover, in these 6 of the 20 included CT datasets, ROI-specific evaluations were
generated in order to specifically analyze the duration of the ROI-labelled findings
in the different contrast administrations (non-enhanced CT, CT angiography and CT
perfusion) in seconds, related to the overall time needed to inspect each dataset.
Discussion
In the clinical routine, neuroradiologic CT imaging in the case of suspected stroke
using unenhanced brain CT, CT angiography of the supra-aortic vessels and CT perfusion
represents a common method to explore the pathology causing the symptoms. CT angiography
offers detailed information about the vascular system and CT perfusion provides valuable
information regarding the brain’s vascular physiology [9 ]
[10 ]
[11 ]. Consequently, radiologists read hundreds of sections of imaging in the case of
a suspected acute ischemic stroke. Compared to reading a single static radiograph,
requirements differ substantially in terms of understanding the cognitive and perceptual
processes of interpreting volumetric data sets like CT stroke imaging. The visual
search for lesions is a process that has been shown to be mainly acquired rather than
innate and that depends on the working experience.
To evaluate the observations of the assessors, this method has so far been applied
for the radiologic fields of mammography and projection of the chest or the extremities
[12 ]
[13 ]
[14 ]
[15 ]. Only a few studies of multislice images have been conducted to date. Furthermore,
no studies were conducted for more complex scenarios such as multimodal diagnostic
imaging, for example in the event of medical emergencies such as strokes [16 ]
[17 ]
[18 ]
[19 ].
Several theories regarding potential sources of error in the detection of pathologies
have been proposed. For instance, the holistic model for the searching strategy applied
by radiologists when reading mammograms, the theory about the satisfaction of search
errors or the assumption of a global perceptual process [20 ]
[21 ]
[22 ]. Research conducted with eye tracking technology has revealed a connection between
the visual dynamics and cognitive processes.
Whether radiologists are explicitly aware of the details or not, training has taught
them to focus attention more efficiently on items with high probability of interest
[23 ].
Some investigations even differentiated between the 3 different phases in which errors
of image reporting can occur. This includes the search, the recognition and the cognitive
process to identify a lesion [24 ].
Overall and itemized diagnostic accuracy
The study focused on the accuracy of search as one of the major pitfalls, revealing
an overall detection rate between 76 % and 81 %, with slight differences correlated
to the level of education. Especially with regard to the accuracy of the recognition
of secondary findings in the group of senior physicians, it should be mentioned that
the lesions were partially fixed but not mentioned. In the context of this study setting,
a clear distinction cannot be made between missed pathologies and incidental findings
not mentioned due to their lack of clinical relevance.
Overall and itemized times of image reading
Kundel and Nodine described a diagnostic accuracy of 70 % when presenting chest radiographs
to radiologists for 200 msec. According to Kundel et al., the visual search of radiographic
images occurs in two steps: starting with the initial rapid primary “global” response,
taking place during the first 40–200 ms, followed by a second “systemic scan”, allowing
for accurate object recognition using foveal vision [25 ]. Systematic viewing is a technique whereby anatomical areas are inspected in a fixed
order to ensure complete inspection of the images. In order to imitate the clinical
setting of reporting as accurately as possible and to include the assumed “systematic
scan” into our analysis, we determined a period of 10 minutes. This allows a detailed
examination.
Furthermore, we investigated the detection error in greater detail by deconstructing
the assessor’s image analysis into the following time ranges: the time until first
detection of the pathology, the duration of the fixation of the pathology and the
proportionate time of observation in different contrast phases (NECT, CTA and CT perfusion).
To the best of the author’s knowledge, previous studies have not investigated these
aspects of image analysis. Neuroradiologic specialists detected the pathology most
rapidly, which could be explained by their superior experience. However, the relative
time needed to fix the pathology does not reveal significant differences between the
groups. In this case, this could be attributed to the common pathologies, which are
easy to assess, even for less experienced assessors.
The differences between the proportions of analyzing images separated into different
contrast phases could also be explained by the neuroradiologists superior experience,
resulting in less time required to analyze the CTA in relation to the total time of
inspection. However, the lack of experience leads residents to spend proportionally
more time on CTA image analysis, compared to the total time.
Visual search is an integral part of the radiologist's interpretation procedure, but
the precise search requirement varies considerably with individual clinical situations.
Such as the purpose of the examination and the radiologist's prior knowledge about
the patient. In many cases, a free search is required, meaning that the radiologist
needs to search the complete data set for any abnormalities, whereas the clinical
situation may require him to focus on specific anatomical areas [26 ]. Although we specified clinical information and observed fixation of clinically
important regions in reference to the symptoms more rapidly in the group of experienced
readers, the influence of clinical information about the patient was not specifically
part of this study and is to be confirmed in further investigations. Our results reveal
that observer’s sensitivity depends on work experience due to systematic scanning
of important areas combined with knowledge about the course of disease. The areas
of missed pathologies mostly included secondary findings without connection to the
primary disease or a lack of methodical approach of the inspection. Thus, eye tracking
can potentially be used to improve the learning curve of teaching residents for radiologic
inspection by teaching them to follow the eye-gaze patterns of expert radiologists.
A limitation is the use of the final radiological reports to define the major and
minor findings. While major findings are most likely be reported, minor findings might
not be. Additionally, the sample size was restricted both by limited availability
of participants and the time required for the individual evaluations. Furthermore,
general radiologists in our institute are very well trained in cross-sectional imaging
diagnostics of neuroradiological findings, as there is no separation between general
and neuroradiological findings in cross-sectional imaging, which can explain the small
differences in performance between general radiologists and neuroradiologists.
In conclusion, our results suggest that the experience of radiologic physicians improves
the detection of major findings by looking at CT images more systematically with inspection
of anatomical areas in a fixed order, as well as an intentional scan of clinically
important regions (like areas causing identifiable symptoms), taking into account
the individual clinical symptoms.
Visual attention was more focused in the case of the experienced physicians and often
less time was needed.
However, all levels of experience seem to be able to improve their performance by
analyzing CT images systematically.
Conclusion
Our results suggest that the experience of radiologic physicians improves the detection
of major findings by looking at CT images more systematically, meaning inspection
of anatomical areas in a fixed order to ensure full coverage, as well as an intentional
scan of clinically important regions (like areas causing identifiable symptoms), taking
into account the individual clinical symptoms.
Visual attention was more focused in the case of the experienced physicians and often
less time was needed.
Clinical relevance of the study
Understanding how radiologic expertise is developed may allow training regimes and
consequently the performance of all groups of experienced and inexperienced radiologists
can be improved.
The speed and thus efficiency of expression as well as the quality of detection rates
can be improved.
Pitfalls can be avoided with knowledge of the approaches.