Key words
ultrasonography - structured reporting - quality improvement - interrater reliability
Background
Head and neck sonography (HNS) can be considered the diagnostic method of first
choice for the diagnosis of a wide array of soft-tissue diseases, both in
otolaryngology and maxillofacial surgery [1]
[2]
[3]. This is first and
foremost based on its broad availability, patient safety, as well as the potential
for intraoperative use [4]. Additionally, unlike
computed tomography (CT) and magnetic resonance imaging (MRI), ultrasound is
associated with lower costs and – unlike CT – is not associated with
ionizing radiation and it is well suitable for patients suffering from
claustrophobia [5]. Ultrasound imaging has undergone
sustainable technological improvements in the past decades with vastly improving
image quality and dynamic range.
Despite these advantages, ultrasound examinations have traditionally been associated
with a significantly higher examiner dependency than CT and MRI examinations [5]. In the absence of data concerning the interrater
reliability (IRR) and interrater agreement (IRA) of ultrasound as well as CT and MRI
studies of the neck, existing data show a great variability of these parameters in
each modality with respect to a specific disease and body region [6]. This may pose a major problem, especially in the
preoperative workup of soft-tissue pathologies in head and neck surgery, since
insufficient preoperative ultrasound reports may lead to major intraoperative
complications, extension of surgery, or reoperations [7]. Education standards concerning ultrasound vary greatly among
countries and also among medical schools within a country [8]. Consequently, the European Federation of Societies for Ultrasound in
Medicine and Biology has proposed standards for HNS, both in terms of training and
clinical practice [9]
[10]. Until recently, there was no uniform standard regarding the
structure and content of HNS reports in German-speaking countries [11]. Therefore, it is not surprising that the overall
quality of HNS reports within a sample of German university medical centers has
great potential for optimization [12].
Even despite harmonization of training and clinical practice, the mode of reporting
has been pointed out as a major contributor to information loss and dissatisfaction
among referring physicians [13]. Several studies were
able to demonstrate that structured reporting (SR) improves the report quality and
time efficiency of HNS by standardizing its content for various educational levels
[8]
[13]
[14]
[15]. Furthermore, SR
can be considered a valuable tool to improve preoperative evaluation of CT scans in
the context of functional endoscopic sinus surgery, both by radiologists and
otolaryngologists [16]
[17]. Due to the standardized structure, SR is associated with a decreased
likelihood of missing or misinterpreting key structures, potentially resulting in
misdiagnosis. Additionally, previous studies were able to show a preference for SR
by surgeons [18]. Considering these findings, SR might
also improve IRR and IRA which may further promote the value of HNS in
otolaryngology and maxillofacial surgery.
Therefore, the present study was designed to assess the impact of SR on the IRR of
HNS in a cohort of experienced HNS examiners.
Methods
Study design
Video sequences of four complete HNS examinations as well as detailed images of
the respective pathologies were recorded by two certified HNS instructors. An
additional HNS examination of a healthy volunteer without pathological findings
was used for training purposes (see Video. 1). The four cases included a
follow-up for carcinoma of the parotid gland (case 1, see Video. 2), an
evaluation of a parotid gland mass (case 2, see Video. 3), an evaluation
of a suspected cervical lymph node metastasis (case 3, see Video. 4), as
well as another evaluation of a parotid gland mass (case 4, see Video. 5)
. For these cases, the instructors created SRs using an online-based reporting
template for HNS (Smart Reporting GmbH, Munich, Germany,
http://www.smart-reporting.com), which were used as reference
reports [8]
[13]
[14]
[15]. Anonymized
video and image files as well as detailed instructions on how to use the SR
template were sent out to nine departments with significant HNS expertise.
Video 1 Video and image files of test case.
Video 2 Video and image files of case 1: Follow-up for carcinoma
of the parotid gland.
Video 3 Video and image files of case 2: Evaluation of parotid
gland mass.
Video 4 Video and image files of case 3: Evaluation of suspected
cervical lymph node metastasis.
Video 5 Video and image files of case 4: Evaluation of parotid
gland mass.
As previously described, the additional training case was included to allow the
examiners to become familiar with the SR template and was to be completed first.
Subsequently, participating senior physicians were asked to create SRs of the
four cases based on the provided video and image files, resulting in
n=36 reports. Since previous publications by our group showed that SR is
consistently superior to conventional reporting (CR) in terms of report quality,
there is sufficient data on the IRR of CR, and CR has limited relevance to the
central aim of this study, no control group in which CR was used was included in
the study design [8]
[13]
[14]
[15].
After completing the SRs, the examiners rated the user friendliness of the SR
template by using an existing questionnaire with a visual analog scale (VAS)
[8]
[13]
[14]
[15].
Report evaluation
Two certified head and neck ultrasound instructors analyzed all 36 anonymized
reports for overall report completeness as well as report content and assessed
the IRR and IRA. In this scenario, IRR refers to the extent to which different
examiners provide consistent evaluations, whereas IRA refers to the degree to
which different examiners agree upon the same categorical decision or
classification when rating the same content.
Participating senior physicians were questioned about user satisfaction utilizing
an existing questionnaire [8]
[13]
[14]
[15]. This questionnaire surveyed whether an SR
template is useful and applicable in everyday clinical practice (questions 1 and
2) as well as whether SR may improve overall reporting (question 3). It also
asked participating physicians about the time required for SR (question 4) and
its economic value (question 5). Using a 10-point visual analog scale (VAS), the
questionnaire furthermore asked whether SR might assist inexperienced physicians
in learning ultrasound examinations and reporting (questions 6 and 7) and
whether the SR template is easy to use and is neatly arranged (questions 8 and
9).
Sample size calculation and statistical analysis
As described by Sim and Wright, the number of reports needed in this study was
determined based on previous studies concerning SR of HNS [13]. The power was set to 80% with a
significance level of α=0.05. Taking into account a proportion
of positive ratings of 50%, a baseline κ of 0.4 and a previously
published κ using SR of 0.9, 27 ratings are needed to determine
significant differences in IRR [13]. The κ
values were interpreted as proposed by the Landis and Koch classification [19]. Consequently, IRR was considered as almost
perfect (κ 0.81–1.0), substantial (κ 0.61–0.8),
moderate (κ 0.41–0.6), adequate (κ 0.21–0.4), or
slight (κ 0–0.2).
Data are reported as the mean±SD. The Shapiro-Wilk test was used to
determine normal distributions. A T-test was used to compare overall
completeness and IRA. A p-value of less than 0.05 was considered statistically
significant. Fleiss’ κ was used to evaluate IRR. All statistical
analyses were performed using GraphPad Prism 9.0.1 (Graphpad Software LLC., San
Diego, CA, USA) and Microsoft Excel 2019 (Microsoft Corporation, Redmond, WA,
USA).
Results
Five of the nine participating departments (55.6%) reported use of a digital
reporting system in clinical practice. Out of these, three departments
(33.3%) used some kind of structured reporting approach (see [Fig. 1]).
Fig. 1 Distribution of reporting system use by participating senior
physicians in the clinical routine. Within this cohort, 55.6% of
participants used digital reporting systems in clinical practice (a)
while 33.3% employed structured reporting elements (b).
Report Completeness
In-depth analysis of reports created by study participants using SR revealed very
high completeness ratings of all reports (91.8%±11.72%),
which was consistent in all four cases (see [Fig.
2]). In detail, overall report completeness was
96.1%±6.5% for case 1 (follow-up after a parotid gland
carcinoma), 90.5%±12.1% for case 2 (parotid gland mass),
87.9%±12.9% for case 3 (cervical mass) and
92.8%±12.6% for case 4 (parotid gland mass). Differences
between the overall completeness ratings of the four cases were not
significant.
Fig. 2 Results of overall report completeness analysis. Structured
reporting (SR) achieves very high ratings in terms of report quality
consistently throughout all four cases. No significant differences were
observed among cases.
Interrater reliability and interrater agreement
The IRR was calculated using Fleiss’ κ for each case as well as
for all acquired data. Overall, the IRR was substantial with a Fleiss‘
κ of 0.73. Overall, the IRA was high at
87.2%±15.1%. In detail, Fleiss’ κ was
0.78 and the IRA was 87.2%±15.1% for case 1, 0.92 and
96.9%±5.4% for case 2, 0.66 and
80.4%±13.1% for case 3, and 0.74 and
85.7%±18.2% for case 4. There was a significant
difference in IRA between cases 2 and 3 (p=0.0177, see [Fig. 3]), while the other cases did not show
significant differences.
Fig. 3 Results of interrater reliability and interrater agreement
analysis. The use of structured reporting yields substantial to almost
perfect interrater reliability in all analyzed cases (a).
Additionally, interrater agreement was also very high in all cases
(b). Except for cases 2 and 3, there were no significant
differences in interrater agreement between analyzed cases.
*p<0.05
User satisfaction
Assessment of VAS-based questionnaires revealed very high user satisfaction using
the SR template (8.6±1.8). In detail, the SR-based approach was rated to
be useful (9.8±0.6) and suitable for routine clinical use
(9.9±0.3). SR was thought to improve reporting (9.6±0.8) and to
be time-efficient (7.2±2) and participants felt that any additional time
needed was well-spent (9±0.9). Participating senior physicians stated
that SR may be beneficial for inexperienced physicians to acquire ultrasound
examination (8.2±2.6) and reporting skills (9±1.3). The template
was perceived as easy to use (7.7±0.9) and neatly arranged
(7.3±2.3, see [Fig. 4]).
Fig. 4 Analysis of questionnaire findings using visual analog
scale (VAS, 0: complete disagreement, 10: complete agreement).
Participating senior physicians were surveyed concerning the usefulness
(Q1) and applicability (Q2) of structured reporting (SR) in everyday
clinical practice, improvement in overall reporting (Q3), time
efficiency (Q4), whether additionally needed time was well spent using
SR (Q5), whether SR is beneficial for inexperienced physicians to learn
ultrasound examinations (Q6) and reporting (Q7), whether the SR template
is easy to use (Q8), and whether the SR template is neatly arranged
(Q9). In summary, the questionnaire revealed substantial user
satisfaction in all categories (overall).
Discussion
HNS is considered the diagnostic modality of choice for a wide variety of soft-tissue
pathologies of the head and neck. To date, the preferable reporting modality has not
yet been well defined [1]
[2]
[3], but SR seems to additionally
increase the value of HNS examinations, as shown in different studies over the last
couple of years [8]
[13]
[14]
[15].
While report quality and the reliability of extracted information by referring
physicians have been shown to be very high, there is, to our knowledge, no data
concerning the impact of SR on the IRR of HNS. Consequently, the present study was
designed to assess the IRR of HNS for various pathologies.
IRR is traditionally believed to be rather low for all kinds of ultrasound
examinations compared to CT or MRI [5]. Especially for
diagnostic modalities which are used for clinical follow-ups and may involve various
examiners, IRR is of utmost importance as it reduces both false-positive and
false-negative findings. While false-positive findings may trigger additional and
unneeded, possibly cost-intensive or invasive diagnostic procedures and treatments,
false-negative findings may lead to delayed diagnosis with progression of the
underlying disease, potential decrease in prognosis, and possible legal
consequences.
Therefore, the present study’s results are of great interest as they
underline the value of HNS as a quick, cost-efficient, noninvasive, and precise
diagnostic modality. The encouraging findings of Goncalves et al. regarding the very
good IRR of HNS for the assessment of sialolithiasis [5] which may be superior to CT or MRI for this indication are also
extremely interesting.
Our data show that within this cohort report completeness using SR was very high for
all four different cases. As shown in previous studies, implementation of SR
improves report completeness especially through the standardized query of structures
and regions, even if they are not involved in the pathology of interest and are not
in the center of attention [8]
[13]
[14]
[15].
This is safeguarded by the appropriate use of mandatory items within the report
template. Reporting may only proceed once these mandatory items are completed. This
basic principle of SR has been proven to reduce the frequency of missed pathologic
findings and to improve diagnostic precision [20].
Secondly, SRs utilize standardized terminology that has been previously approved in
expert consensus and in accordance with published recommendations [11]. This ensures objective description of pathological
findings. Moreover, the standardized terminology, structure, and digitalization of
SRs enable appropriate comparability of reports and scientific use in individual and
big data analyses as well as the application of artificial intelligence and deep
learning technologies [21].
Unlike CT and MRI, ultrasound is a dynamic examination technique, including the
movement of structures and images, various angles, compressibility, and functional
parameters such as the Doppler effect. Consequently, this entails a greater
dependency on the individual examiner. Since participating physicians did not
perform the ultrasound examinations on their own, standardized video sequences as
well as detailed images depicting all necessary aspects of the pathology were
provided in order to assess the IRR and IRA of HNS in a realistic manner. In
addition to the very high completeness ratings, the IRR was substantial with a
Fleiss‘ κ of 0.73. There were no significant differences in
completeness or IRR for the different cases. The IRA was consistently very high
(87.2%±15.1%) except for in two cases. Likewise, the
standardized structure and terminology are major factors contributing to the
substantial IRR.
Our data clearly demonstrate that the use of SR resulted in a consistent
interpretation of the provided examination data. This is essential for a reliable
diagnosis and efficient therapy. Due to the superior comparability, SR has the
potential to improve communication with other involved healthcare providers, thus
facilitating patient management and reducing inquiries and, more importantly,
misunderstandings [22]. Furthermore, SR may be a
valuable quality control tool to assist in the accreditation process that forms the
basis for patient referrals, treatment, and billing in many countries [23].
Potential linguistic imperfections in written findings by non-native speakers are
exacerbated by the continued increase in the demand for telemedicine solutions [24]. Telemedicine, which is particularly useful for
reporting diagnostic modalities, has become a necessity for rural regions with a
shortage of specialists [25]. The increasing
availability of broadband internet connections has made it possible to transfer very
large amounts of medical data that can be processed in other regions, whether
nationally or internationally. In the case of international telemedicine reporting,
there is a risk that the reporting specialist may not have adequate linguistic
competence to report findings or to respond to follow-up questions from referrers
in
the language of the source country. This may further hinder IRR in the context of
HNS. Consequently, SR could be an essential element to overcome inadequate reporting
quality due to poor language skills, as modern SR systems can automatically output
a
native language report in foreign languages [26].
Examiner satisfaction was very high in all ten assessed categories, with an overall
VAS value of 8.6±1.8. Our findings are in line with the literature and
confirm the importance of SR for the quality of ultrasound examinations [5]
[8]
[13]
[14]
[15] but also in the context of big data analysis and
therapy monitoring [18]
[27]. Both the redundancy of the SR process and the standardized workflow
are major contributors to the examiners’ preference for SR compared to CR
[8]
[13]
[14]
[15]. Concerns on the
part of physicians using analog CR that SR templates have overly rigid reporting
conditions for the great variety of pathological findings in clinical practice have
been rebutted by multiple studies [8]
[13]
[14]
[28]. In fact, the opposite seems to be the case, since
SR’s rather rigid approach has been shown to be rather convenient for
inexperienced examiners [8]
[14].
Conclusion
Our data demonstrate that the implementation of SR ensures a substantial IRR of HNS
examinations, thereby reducing one of its most criticized disadvantages. The use of
SR in clinical practice can improve diagnostic accuracy and safety of treatment, as
well as simplify data analysis and transfer, communication, and quality assurance.
Further studies will have to determine the potential impact on patient outcomes.