Keywords epidemiology - CT-angiography - embolism/thrombosis
Background
Pulmonary embolism (PE) is the third most common cardiovascular disease after acute
myocardial infarction and stroke [1 ]. Because of the risk of right heart failure due to obstruction of the pulmonary
vessels, pulmonary embolism is associated with a high mortality rate of approx. 30%
if left untreated. Since the PE mortality risk is highest within the first hours of
symptom onset, early diagnosis is essential [2 ]. CT angiography is the internationally recognized reference standard for diagnosing
PE [3 ]
[4 ]. CT is not only very sensitive for detecting thrombi in the pulmonary vessels but
also allows simple quantification of possible right ventricular strain as a surrogate
parameter for the risk of right heart failure [4 ]
[5 ]. As a result of the described urgency, it is essential for radiology to provide
quick and correct reports for these CT examinations.
A majority of radiology reports, including CT angiography reports in the case of suspicion
of PE (PE CT), are still written as free text. Free-text reports are highly variable
with respect to content and structure. In addition, free-text reports often lack relevant
information for the clinical referring physician [6 ]
[7 ]. Structured reporting (SR) has increasingly become a topic of interest in recent
years. The advantages of SR compared to free-text reporting have been shown in dozens
of studies. Structured reports are easier to read, easier to compare, more complete,
and more detailed than free-text reports [6 ]
[7 ]
[8 ]. This could also be explicitly proven for PE CT examinations [9 ]. Both clinical referring physicians and radiologists prefer structured reports compared
to free-text reports [10 ].
However, SR has not yet been able to become established in the clinical routine in
spite of the numerous advantages. Usage is low in most radiology departments [10 ]. This is probably due to difficulties that can occur when implementing SR. The introduction
of SR solutions in the clinical reporting workflow can be technically challenging
and require significant effort. Moreover, implementation is highly dependent on the
manufacturer of the Picture Archiving and Communication System (PACS) and Radiology Information System (RIS) [11 ]
. The integration of voice recognition into SR is currently insufficient. In contrast
to free-text reports, structured reports must be largely completed manually with mouse
and keyboard, which is inconvenient and time-consuming and can distract from the actual
image data [12 ].
In addition to the advantages regarding report quality, structured reporting as an
IT-based method makes it possible to automatically add large amounts of highly structured
data to databases and to perform secondary evaluation. Data mining of structured
reports can be used to answer a variety of questions.
Thus, for example, data on the epidemiology of the disease including the setting-specific
prevalence can be collected based on the report data. In 2017, our department conducted
the first feasibility study on this topic. The study included just over 500 patients,
who underwent CT examination for suspicion of PE. Since structured reporting of PE
CT examinations had not yet been implemented in the clinical routine at this time,
all corresponding free-text reports had to be manually retrospectively structured
[13 ]. There is hardly any epidemiological data in the literature generated primarily
from structured reports created in the clinical routine [14 ]. Moreover, data mining of structured reports is also suitable for scientific questions.
Therefore, based on structured report data, a new system for evaluating clot burden
in PE patients was able to be developed [15 ]. Structured report data is also ideally suited for the training and validation of
artificial intelligence (AI) [16 ].
Building on the described feasibility study, the goal of this study was to use a data
mining algorithm to analyze all structured PE CT reports created in the clinical routine
in the last five years in our department's database. For internal quality assurance,
epidemiological data and disease characteristics as well as insight regarding possible
in-hospital differences in PE prevalence was acquired.
Methods
In 2016, a web-based SR platform compatible with Integrating the Health Enterprise Management of Radiology Report Templates (IHE MRRT) was developed and implemented for use in the clinical routine in our hospital
[17 ]. Since then, numerous templates for various types of examinations in the ultrasound,
CT, and MRI modalities have been able to be developed and implemented [10 ]. The reporting template for CT angiography to rule out pulmonary embolism was implemented
in the clinical routine in the third quarter of 2018. [Fig. 1 ] shows the current version of the template. Data is typically entered in the template
during reporting with the mouse and keyboard. The diameter of the right and left ventricles
(RV/LV ratio) was determined for the quantification of possible right ventricular
strain by measuring the maximum ventricular diameter in the axial view and is automatically
calculated by the template after entry of the two values. The presence of right ventricular
strain was defined as an RV/LV ratio > 1 in accordance with the guidelines of the
European Society of Cardiology [18 ].
Fig. 1 Structured reporting template for CT angiography of pulmonary embolism.
All reports generated and released in the clinical routine with the help of templates
are automatically saved in the reporting platform and are thus accessible for secondary
analyses. Reports can be systematically read out by the reporting platform with the
help of the software RapidMiner Studio (RapidMiner, Cambridge, USA) for a predefined
time period. Individual parts of reports (e.g., the presence of right ventricular
strain) or the entire contents of reports can be read out. The results are output
as a .csv file and can then be evaluated with the help of statistics software. [Fig. 2 ] shows a graphic of the workflow described here.
Fig. 2 Graphic of the workflow for secondary evaluation of structured reports (data mining).
For the questions examined in this study, the following content was read out from
the PE template: Patient age and sex, in-hospital referring physician, type of referral
(outpatient, standard ward, intensive care unit), presence of a pulmonary embolism,
type of possible pulmonary embolism (main pulmonary artery, lobar artery, segmental
artery, subsegmental artery), side of the embolism (right, left, bilateral), presence
of right ventricular strain (RV/LV ratio > 1), and date and time of the examination.
All reports generated since implementation of the template from 8/2018 to 7/2023 were
included.
Statistical analyses were performed with the help of the software R (The R Foundation
for Statistical Computing, Vienna, Austria). Binary and categorical data were expressed
as absolute values and percentages. Continuous data were presented as mean and standard
deviation (SD). Not normally distributed data were presented as median and interquartile
range (IQR). The Kruskal-Wallis test was used to check for statistical significance.
The level of significance was set to α = 0.05.
Results
In total, 2790 structured reports for patients who underwent PE CT examination were
included.
A pulmonary embolism was detected in 24% (n = 678) of the total patient population.
The median patient age was 71 years (IQR: 58–80). The ratio of men to women was 1.2:1
([Fig. 3 ]).
Fig. 3 Age and sex distribution for pulmonary embolism based on a tree diagram.
There were some significant differences in PE prevalence depending on the type of
referral. Among the patients referred by outpatient providers who comprised 69% (n
= 1913) of the total cohort, only 23% (n = 441) had a positive finding. Among the
678 patients referred from a standard ward, 27% (n = 182) had a PE. A higher rate
of positive findings (30%, n = 59) was seen In the relatively small patient population
referred from the ICU (n = 199) ([Fig. 4 ]A). Differences in PE prevalence were also seen depending on the referring clinical
discipline. In total, 75% (n = 2104) of patients were referred by internal medicine
departments. A PE was detected in 22% (n = 470) of these patients. 34% of the 453
surgically referred patients (n = 155) were diagnosed with a PE. The remaining departments
collectively designated here as “other” referred 233 patients. 24% (n = 57) of these
patients were diagnosed with a PE ([Fig. 4 ]B).
Fig. 4 Frequency of PE depending on the type of referral (outpatient, regular ward, intensive
care unit (A )) and as a function of the referring clinical department (B ).
Of the 678 patients with PE, 32% (n = 215) were diagnosed with a central embolism,
25% (n = 172) with a lobar embolism, 33% (n = 228) with a segmental embolism, and
10% (n =69) with a subsegmental embolism. 65% of all PEs were bilateral (n = 444),
10% (n = 71) were left-sided, and 35% were right-sided. Right ventricular strain was
seen in 43% (n = 292) of PE patients.
Further evaluation showed that central embolisms are associated with a higher RV/LV
ratio than lobar, segmental, or subsegmental embolisms ([Fig. 5 ]A). In addition, right ventricular strain occurred significantly more frequently
(73%) in central embolisms than in lobar (38%), segmental (26%), and subsegmental
embolisms (13%) ([Fig. 5 ]B).
Fig. 5 RV/LV ratio depending on the highest location of an embolism in the pulmonary artery
system (A ) and frequency of right ventricular strain as a function of the highest location
of an embolism in the pulmonary artery system (B ). RV/LV ratio depending on the PE side (C ) and the occurrence of right ventricular strain as a function of the PE side (D).
RHB = right ventricular strain.
Analogously, bilateral embolisms were also associated with a higher RV/LV ratio compared
to unilateral embolisms ([Fig. 5 ]C). 54% of all bilateral embolisms included right ventricular strain, while this
was the case in only 27% of left-sided embolisms and in 19% of right-sided embolisms
([Fig. 5 ]D).
Discussion
The data show that consistent use of SR allows prevalence statistics and epidemiological
data to be acquired without significant effort as shown here using pulmonary embolism
as an example. The data provide a valuable overview of the patient population and
the distribution of disease features and disease presentation. In addition, it provides
important feedback for clinical referring physicians and radiologists.
The age and sex distribution of PE in the examined cohorts (median patient age 71,
ratio of men to women 1.2 to 1) is in agreement with multiple other epidemiological
studies on PE [19 ]
[20 ]
[21 ]. The calculated PE prevalence of 24% in the total cohort coincides with the feasibility
study performed in 2017 in which free-text reports from a smaller cohort (n = 500)
were retrospectively structured [13 ]. An American study in which free-text reports for over 500 PE CT scans were manually
evaluated describes a significantly lower PE prevalence of 9% but only includes patients
referred by an emergency department [22 ]. It must also be taken into consideration that the indication for CT sometimes has
a lower threshold in the USA due to the different legal regulations.
Differences in the probability of a positive finding depending on the type of referral
could also be identified in this study. The lower rate of 23% in the outpatient setting
compared to 27% in standard wards and 30% in the ICU can be primarily explained by
the fact that the diagnosis of a PE is more evident in the inpatient setting in the
case of the onset typical symptoms, e.g. after an operation or longer period of immobilization,
than in the outpatient setting. In the latter case, PE CT examination is more often
used to rule out PE than to confirm the diagnosis in the case of unclear symptoms
and a positive D-dimer on laboratory tests. This is also supported by the lower rate
of positive findings in patients referred by internal medicine departments (22%) compared
to surgically referred patients (34%). Hahiharan et al. were able to show a significant
increase in right ventricular strain in central embolisms compared to peripheral embolisms
for a smaller cohort of patients for whom image and report data was evaluated manually
[23 ]. This connection was able to be proven with the help of the SR-based data mining
approach performed here in a cohort that was more than twice the size.
Finally, the results of this study are within the range of expected results. The actual
innovation is the method used to generate the results. This method makes it possible
to acquire data from imaging quickly and without significant personnel requirements.
Arduous review of individual reports would be needed to obtain the same results from
free-text reports. However, it must be taken into consideration that the primary generation
of structured reports in the clinical routine can be more time-consuming than conventional
free-text reporting. There are also many uses for the data. It is essential for both
clinical referring physicians and radiologists to identify probabilities for a positive
finding in radiology examinations. Particularly in the case of examinations involving
radiation, like CT, quality assurance measures in the sense of radiation protection
would be required in the case of low probabilities. Imaging-based structured data
continues to be ideally suited for data enrichment of registry databases as are already
available for PE patients in various European countries [24 ]. These databases typically primarily include clinical data and could be additionally
optimized with image parameters. This could allow the development of AI models, e.g.,
to predict the risk of relapse. Moreover, in the future, the data could also be included
in data integration centers set up at university hospitals as part of medical information
initiatives, which would allow site-independent use for research purposes.
In addition to the structured report data mining approach shown here, free-text reports
can also be retrospectively evaluated for secondary data use by using natural language
processing (NLP) [25 ]
[26 ]
[27 ]. NLP as an AI-related technology is capable of automatically analyzing free text
and extracting and structuring relevant content [27 ]. However, on the one hand, the method is limited by the fact that the free-text
reports used for this do not always contain all necessary information [6 ]
[7 ]. On the other hand, although NLP algorithms have improved significantly in recent
years, completely correct detection of free-text report content is still not guaranteed
[26 ]. Moreover, large language models like Chat GPT-4 are capable of retrospectively
converting entire free-text reports into a structured report [28 ]. Finally, the SR-based approach used here is more suitable since complete datasets
can be generated in a highly structured form and directly evaluated without the need
for an additional conversion step.
The present study has limitations. A high clinical usage rate of SR is needed to ensure
the high quality of epidemiological data and prevalence statistics acquired via the
SR-based data mining approach. High usage rates could be shown in our hospital for
most structured reporting templates (e.g., polytrauma CT 97%, prostate MRI 92%, and
urolithiasis CT 91% in the year 2022). In contrast, the usage rate of the PE template
(58% in 2022) since implementation of the template in the 2018 (18%) was relatively
lower but has increased steadily [10 ]. Finally, SR is not equally suited for all examination types and its use at our
hospital is not mandatory. In the case of PE CT scans, there are alternative diagnoses
in up to 33% of examinations [22 ]. When providing an exact description of alternative diagnoses and in highly complex
cases, radiologists could find the reporting template unsuitable and therefore not
use it. Based on this, it can be assumed that the actual probability of a positive
PE CT finding is lower than in the evaluated patient population. Regardless of the
applicability in individual types of examinations, there is still potential to optimize
SR and to further promote clinical use. This could be achieved with improved integration
of voice recognition in structured reporting templates with the help of NLP, which
is the subject of current research studies [29 ].
Moreover, this imaging-based study only analyzes the disease only at a defined point
in time. Therefore, it is not possible to draw any conclusions about the course of
the disease. Finally, patients who underwent an examination other than CT, e.g., MRI
or scintigraphy, for diagnosis or exclusion were not included. However, in the specific
case of PE, this percentage of patients is to be considered relatively low [30 ].
Conclusion
SR makes it possible to collect epidemiological data and prevalence statistics without
the need for additional time or resources by performing data mining of reports. For
example, differences in the in-hospital PE prevalence could be shown as a function
of the clinical referring physician and the type of referral. The generated data can
be used in various areas, e.g., for internal quality assurance, scientific analyses,
and for data enrichment of registry databases. To benefit from these advantages, implementation
and consistent use of SR are essential and recommended.
Clinical relevance
Consistent clinical use of SR makes it possible to obtain epidemiological data and
in-hospital prevalence statistics without the need for significant resources by performing
data mining.
Using pulmonary embolism as an example, differences in in-hospital prevalence can
be calculated as a function of the referring physician and the type of referral and
probabilities for the presence of right ventricular strain can be determined as a
function of the type of embolism.
Knowledge of the probability of a positive finding in radiology examinations provides
important feedback for radiologists as well as for clinical referring physicians.
The generated data can also be used in various areas, e.g., for internal quality assurance,
radiation protection, scientific analyses, and for data enrichment of registry databases.
Abbreviations
CT:
Computed tomography
PE:
Pulmonary embolism
SR:
Structured reporting
IHE MRRT:
Integrating the Health Enterprise Management of Radiology Report Templates
AI:
Artificial intelligence
NLP:
Natural language processing
RV/LV ratio:
Right ventricular to left ventricular ratio