Appl Clin Inform 2024; 15(04): 660-667
DOI: 10.1055/a-2337-4739
Research Article

Effect of Ambient Voice Technology, Natural Language Processing, and Artificial Intelligence on the Patient–Physician Relationship

Authors

  • Lance M. Owens

    1   Department of Family Medicine, University of Michigan Health-West, Wyoming, Michigan, United States
  • J Joshua Wilda

    2   Health Information Technology, University of Michigan Health-West, Wyoming, Michigan, United States
  • Ronald Grifka

    3   Department of Research, University of Michigan Health West, Wyoming, Michigan, United States
  • Joan Westendorp

    3   Department of Research, University of Michigan Health West, Wyoming, Michigan, United States
  • Jeffrey J. Fletcher

    3   Department of Research, University of Michigan Health West, Wyoming, Michigan, United States
 

Abstract

Background The method of documentation during a clinical encounter may affect the patient–physician relationship.

Objectives Evaluate how the use of ambient voice recognition, coupled with natural language processing and artificial intelligence (DAX), affects the patient–physician relationship.

Methods This was a prospective observational study with a primary aim of evaluating any difference in patient satisfaction on the Patient–Doctor Relationship Questionnaire-9 (PDRQ-9) scale between primary care encounters in which DAX was utilized for documentation as compared to another method. A single-arm open-label phase was also performed to query direct feedback from patients.

Results A total of 288 patients were include in the open-label arm and 304 patients were included in the masked phase of the study comparing encounters with and without DAX use. In the open-label phase, patients strongly agreed that the provider was more focused on them, spent less time typing, and made the encounter feel more personable. In the masked phase of the study, no difference was seen in the total PDRQ-9 score between patients whose encounters used DAX (median: 45, interquartile range [IQR]: 8) and those who did not (median: 45 [IQR: 3.5]; p = 0.31). The adjusted odds ratio for DAX use was 0.8 (95% confidence interval: 0.48–1.34) for the patient reporting complete satisfaction on how well their clinician listened to them during their encounter.

Conclusion Patients strongly agreed with the use of ambient voice recognition, coupled with natural language processing and artificial intelligence (DAX) for documentation in primary care. However, no difference was detected in the patient–physician relationship on the PDRQ-9 scale.


Background and Significance

Electronic health records (EHRs) improve multiple facets of health care and have become the standard for medical record documentation.[1] [2] However, the administrative and cognitive burden of provider documentation in the EHR has been cited as a major factor in decreased productivity and provider burnout.[3] [4] [5] [6] Despite advancements in voice input technology for documentation, current-state EHRs still demonstrate usability challenges leading to workflow inefficiencies and provider dissatisfaction.[7] The effect of the EHR on patient satisfaction and the patient–physician relationship is complex with both positive and negative influences being reported.[8] Hence, there is reason to believe that there is a balance between how the physician interacts with the EHR and how much time they spend focusing directly on the patient which may affect patient satisfaction.

Voice recognition dictation software and in-person or virtual scribes have been shown to reduce documentation burden, increase efficacy, and improve provider satisfaction.[9] [10] [11] [12] Despite this, provider burnout remains at an all-time high and little evidence supports that fact that these interventions improve patient satisfaction.[13] Some evidence supports that dictation in front of the patient or writing notes collaboratively during the encounter improves patient satisfaction.[14] [15] Conversely, excessively documenting and/or focusing on technology has been associated with decreased patient-centered measures, adversely affecting the patient–physician relationship.[8]

One novel technology that aims to reduce documentation burden is ambient voice recognition, coupled with natural language processing and artificial intelligence (DAX; Nuance Communications, Inc). We have recently demonstrated an association between DAX use and significantly reduced documentation burden and primary care provider dis-engagement scores on the Oldenburg Burnout Inventory.[9] The effects on patient satisfaction have not been investigated though it is plausible that utilizing DAX will lead to a greater perception of patient centeredness during the encounter, hence improving the patient–physician relationship. Alternatively, the obvious lack of “note taking” may suggest to the patient that the physician is not truly paying attention or engaged in shared decision making.


Objectives

Given the intimacy and importance of the patient–physician relationship in health care, we investigated the effects of DAX on patient satisfaction in a prospective observational cohort. We hypothesized that use of the technology would improve the quality of the patient–physician relationship.


Methods

Setting and Study Design

The research was conducted at the University of Michigan Health West, a community teaching health system which consists of 110 primary care providers. All primary care providers underwent training in January 2022 on DAX, which was deployed and implemented in EPIC, the health systems EHR. The mean year in practice for the providers was 14 years. Phase I of the study was an open-label, single-arm convenience sample enrolling patients from February through December 2023. Phase II was a prospective observational cohort study recruiting consecutive patients during April 2023 from five primary care sites. Patients were asked to complete the survey electronically at the conclusion of their appointment. Patients were informed that the results of the Patient–Doctor Relationship Questionnaire-9 (PDRQ-9) survey would be confidential and not shared with any providers or staff. Implementation of DAX was not a study intervention and no protected health information captured by DAX was accessed during this study. As DAX use was not assigned as a research intervention, informed consent on use of DAX was not required. However, the ethical issues surrounding the use of artificial intelligence are of increasing concern and a recent review determined that “institutions might want to ensure patients are made aware of ambient intelligence via notices of privacy practices in their patient consent forms. For example, a hospital consent form that notifies patients about the use of their medial data might not be sufficient to constitute for research purposes for this type of project, so an additional consent process would be needed. Even when there are no applicable legal requirements for informed consent, it is important to provide transparency regarding the use of ambient intelligence systems in particular settings to maintain public trust and provide people with the opportunity to make decisions regarding their personal information.”[16] In our institution, all patients treated by the health system signed a consent to treat document that included informing them that audio, still images, video, and telehealth may be utilized. Additionally, when DAX was utilized to generate an encounter note, the after-visit summary provided an attestation that “This visit note has been created using Dragon Ambient eXperience and was completed by Dr. Attending Physicians Name.” The study was approved by the University of Michigan Health West Institutional Review Board, and survey consent was obtained electronically at the time of survey completion. Findings are reported in accordance with the Strengthening the Reporting of Observational Studies in Epidemiology guidelines.[17] Survey invitations and study data were managed using Research Electronic Data Capture (REDCap) tools and Qualtrics. REDCap is a secure, web-based application designed to support data capture for research studies ([Supplementary Material]).


Aims and Outcomes

In the open-label, single-arm phase of the trial, patients who did not have a previous encounter in which DAX was used were asked to compare three 5-point Likert questions (5 = strongly agree, 4 = somewhat agree, 3 = neither agree nor disagree, 2 = somewhat disagree, 1 = strongly disagree) comparing their visit to the previous visit with same provider or with their most recent visit with another health care provider (if this was their first visit with the index provider). Patients were asked how strongly they agreed with: (1) the provider seemed to be more focused on me during the visit, (2) the provider spent less time typing on their computer, and (3) my visit felt more like a personable conversation. Overall satisfaction with the visit was also queried (5 = extremely satisfied, 4 = somewhat satisfied, 3 = neither satisfied nor dissatisfied, 2 = somewhat dissatisfied, 1 = not satisfied at all).

The primary aim of phase II was to evaluate if utilization of DAX documentation, as compared to usual documentation, during primary care encounters was associated with any difference in patient satisfaction. Patients were recruited on the day of their encounter and invited to complete an electronic patient satisfaction survey on-site after conclusion of their visit. Patients were not informed of the study hypothesis. DAX use was masked during usual care as the application was run in the background during encounters. The DAX application was installed on providers' mobile phones and providers had the option of using the application on an encounter-to-encounter basis. Although patients have signed a consent form to treat authorization as previously described, they are not additionally asked to consent to DAX use at inception of each individual encounter. When a provider wished to utilize DAX, they activated the application and placed their mobile phone in their coat pocket or on the examination room counter. Participants completed the satisfaction survey in the examination room, prior to receiving their after-visit summary (which denotes if DAX was used). When providers did not utilize DAX for documentation, they may have used Dragon dictation or typed into the medical record. A previous study of our providers demonstrated that when not using DAX our providers usually choose to type notes into the medical record during the encounter.[9]

We choose to measure patient satisfaction with the PDRQ-9 which has been developed and validated in numerous studies to evaluate the patient–physician relationship ([Table 2]).[18] [19] [20] [21] Two secondary outcomes were defined: first, any differences between encounters that utilized DAX and those that did not by evaluating differences in the individual questions on the PDRQ-9. Given the possibility of a ceiling effect, another secondary outcome was defined by asking patients a graphical grading question (scale 0 = not at all to 100 = very well) on how well they felt their provider listened to them during their appointment that day.[22] [23]

Table 1

Cohort demographics and bivariate comparisons dichotomized by DAX

No DAX use (no.: 154)

DAX use (no.: 150)

p-Value

Age (mean years ± SD)

52.4 (17.8)

48.4 (17.1)

0.04[a]

 Race/ethnicity

0.92

 Caucasian

136 (88.3%)

131 (87.3%)

 Black/African American

11 (7.4%)

13 (8.7%)

 Other

7 (4.3%)

6 (4%)

Gender (= male)

46 (30%)

51 (34%)

0.36

Education level

0.047[a]

 No high school completion

8 (5.2%)

6 (4%)

 General Education Development or equivalent

7 (4.6%)

3 (2%)

 High school diploma

28 (18.2%)

27 (18%)

 Some college no degree

40 (26%)

53 (35%)

 Associate degree

21 (13.6%)

15 (10%)

 Bachelor's degree

28 (18.2%)

38 (25.3%)

 Graduate or professional degree

22 (14.3%)

8 (5.3%)

Income (thousand)

0.09

  < 15

7 (4.6%)

8 (5.3%)

 15–24.9

10 (6.5%)

12 (8%)

 25–34.9

7 (4.6%)

11 (7.3%)

 35–49.9

11 (7.1%)

20 (13.3%)

 50–74.9

27 (17.5%)

16 (10.7%)

 75–99.9

22 (14.3%)

29 (19.3%)

 100–149.9

23 (14.9%)

20 (13.3%)

 150–199.9

11 (7.1%)

10 (6.7%)

  > 200

12 (7.8%)

4 (2.7%)

 Prefer not to answer

24 (15.6%)

20 (13.3%)

Waiting room time

0.001[a]

 Short/appropriate (vs. too long)

152/153 (99.4%)

141/150 (94%)

Office/front desk staff

0.42

 Very satisfied/satisfied (vs. dissatisfied)

150/153 (98%)

146/149 (98%)

Nursing staff

0.92

 Very satisfied/satisfied (vs. dissatisfied)

152/153 (99.3%)

145/148 (98%)

Total PDRQ-9 score (median [IQR]; mean (SD))

45 [3.5]

41 (9.5)

45 [8]

40.3 (9.5)

0.31

Abbreviations: IQR, interquartile range; SD, standard deviation.


a Significant at p < 0.05.


Table 2

Median and interquartile range for individual PDRQ-9 questions

Question (median [interquartile range])

No DAX

DAX

p-Value

My primary care provider helps me

5 [1]

5 [1]

0.95

My PCP has enough time for me

5 [1]

5 [1]

0.61

I trust my PCP

5 [0]

5 [1]

0.81

My PCP understands me

5 [1]

5 [1]

0.45

My PCP is dedicated to help me

5 [1]

5 [1]

0.42

My PCP and I agree about the nature of my medical symptoms

5 [1]

5 [1]

0.66

I can talk to my PCP

5 [0]

5 [1]

0.51

I feel content with my PCP's treatment

5 [0]

5 [1]

0.43

I find my PCP easily accessible

5 [1]

5 [1]

0.41

Abbreviation: PCP, primary care provider.



Confounding Variables

Age was modeled as a continuous variable. Race/ethnicity (Caucasian, black/African American, other) was modeled as a nominal variable. Education level and income were modeled as ordinal variables ([Table 1]). Gender was modeled as a binomial variable. Satisfaction with nursing and front desk staff as well as wait room time were collected as ordinal variables and modeled as dichotomous variables ([Table 1]).


Power Analysis

Previous data showed about 50% of the primary care providers were utilizing DAX at the time of the survey. Utilizing data from Porcerelli et al, we estimated we would have 80% power at an alpha of 0.05 to detect a clinically meaningful difference of 2.5 points on the total global PDRQ-9 score with an enrollment of 250 patients.[19] Hence, we planned for a target enrollment of 300 patients to account for approximately 15% of patients not completing the survey.


Statistical Analysis

Continuous variables were screened for normality using normality plots, histograms, and the Shapiro–Wilk test. Parametric data are expressed as mean + standard deviation and nonparametric data as median with interquartile range (IQR). Age was compared by using DAX with two-sample t-tests. Total PDRQ-9 score, the individual PDRQ-9 questions, income level, and education level were compared by using DAX with the Wilcoxon rank-sum test. Wait room time (too long vs. not), satisfaction with nursing (dis-satisfied vs. not), and satisfaction with the front desk (dis-satisfied vs. not) were dichotomized and compared by using DAX with Fisher's exact test. The scale score describing how well their clinician listened to them during their appointment was highly skewed and this also dichotomized at either 100 (very well) versus not and evaluated with a chi-square test. Multivariable logistic regression was utilized to evaluate the adjusted odds of DAX on predicting that the clinician listened very well (100 on the scale = “yes” otherwise “no”) to the patient during the encounter. Covariates in the final model were identified using backward selection and retaining at a 0.1 significance. In the open-label phase of the trial, the Likert scale was dichotomized at “strongly agree” versus not and the 95% confidence interval (CI) for the proportion compared to using the asymptotic 0.5 approximation method. All analyses were performed using SAS 9.4. A p-value <0.05 was considered statistically significant.



Results

In the open-label phase of the trial, 288 patients answered the brief survey questions. Patients “strongly agreed” (as compared to not) that the provider was more focused on them (75.4% [95% CI: 70.4–80.3%; p < 0.001]), spent less time typing (78.8% [74.1–83.5%; p < 0.001]), and made the encounter feel like a more personable conversation than during previous encounters (80.9% [76.4–85.4%; p < 0.001]).

In total, 304 patients participated in the survey during the masked phase II of the trial and demographics can be seen in [Table 1]. The cohort consisted predominantly of middle-aged Caucasians with more females than males completing the survey. The population was highly educated given approximately 43% of the population had a college degree and middle income given the annual median household income of between $75,000 and $99,900. Patients were highly satisfied with wait times, office staff, and nursing. Significant differences were seen in age, education level, and waiting room times, though none seemed clearly clinically relevant ([Table 1]). Utilizing the PDRQ-9, no difference was seen in the rank order of the total score between patients whose encounters used DAX (median: 45 [IQR: 8]) and those who did not (median: 45 [IQR: 3.5]; p = 0.31) ([Table 1]). No specific question on the PDRQ-9 showed a significant difference between encounters in which DAX was utilized compared to not utilized ([Table 2] and [Fig. 1]). Overall, 68.7% (103/150) of patients whose encounters utilized DAX reported a perfect 100 score when asked how well their clinician listened to them during their appointment compared to 74% (114/154) of patients in which DAX was not utilized (p = 0.3). In multivariable logistic regression, only a long wait time (odds ratio [OR]: 0.3, 95% CI: 0.07–0.99) and not having a college education were retained in the model (OR: 0.64, 95% CI: 0.39–1.06) at a level of 0.1 significance when predicting a patient was completely satisfied with if their clinician listened to them during their visit (scale score of 100). The odds of the patient reporting complete satisfaction (scale score of 100) on how well their clinician listened to them during their encounter were not different when DAX was utilized compared to when it was not (OR: 0.8, 95% CI: 0.48–1.34).

Zoom
Fig. 1 Individual PDRG-9 questions dichotomized by “DAX” use. PDRG-9, Patient–Doctor Relationship Questionnaire-9.

Discussion

Our findings demonstrated that when aware and informed of DAX use during the patient encounter, most patients “strongly agreed” that it improved components of the patient–physician relationship. It is known that the “patient's voice” in the medical record is too small, and it is possible that patients perceived DAX use as one way to remedy this concern.[24] However, we found no evidence that routine (masked) utilization of DAX resulted in any significant differences in the quality of the patient–physician relationship as measured by the PDRQ-9 scale or in how well the patient perceived the physician listened to them during the encounter. This is not surprising as patient satisfaction scores are commonly high at baseline and measurement can be confounded by a ceiling effect.[12] [22] [23] Additionally, our provider cohort was predominantly mid-career and like many others likely quite skilled at balancing documentation strategies and maintaining a strong patient–physician relationship. Importantly, some evidence suggests that dictating notes in front of and collaboratively with the patient improves satisfaction.[14] [15] Using ambient recording may lead to patients perceiving the clinician is not listening or taking them seriously in the absence of visible documentation. Hence, it is of importance that utilization of DAX did not detract from patient satisfaction. The intimacy of the patient–physician relationship in health care cannot be overstated. It forms the bedrock upon which trust, understanding, and therapeutic alliance are built. The introduction of any technological tool into this dynamic should never create a wedge and will ideally enhance this bond. Our study echoes this sentiment as it emphasizes that DAX, despite being a sophisticated tool harnessing ambient recording and artificial intelligence, did not diminish the quality of the patient–physician relationship. Rather, when patients were informed about the technology's role, many perceived it as an augmentation to their encounter. This underscores the necessity of integrating technology in a manner that remains congruent with the fabric of medical care: fostering genuine, empathetic, and effective patient–physician relationships.

Our findings are concordant with other documentation strategies such as using scribes to decrease documentation burden as well as to improve physician satisfaction and efficiency. Although two systematic reviews found a low level of confidence supporting effects on documentation burden and physician satisfaction with scribes, the effect on patient satisfaction was less clear.[10] [11] A subsequent randomized controlled trial utilizing scribes in an outpatient family practice setting found that scribes improved physician satisfaction and efficiency without detracting (no difference) from patient satisfaction scores which were high regardless of scribe use.[12] Another recent randomized controlled trial of virtual scribes in a specialty clinic had similar findings.[25] It is important to consider that various documentation strategies may have a positive effect on the patient–physician relationship that is just difficult to detect with the available outcome measurement instruments and within the complexities of the patient–physician relationship.

As mentioned above, various documentation strategies have shown increased physician efficiency and satisfaction without affecting patient satisfaction. Since evidence does not suggest these strategies detracted from patient satisfaction, it seems future studies should focus on more direct measures of provider burnout, documentation quality, effects on malpractice litigation, and most importantly patient outcomes. There may also be financial incentives as one recent randomized controlled trial demonstrated an increase in professional fee billing levels when providers used automated speech recognition for documentation as compared to traditional typing notes into the EHR.[26] Technologies such as DAX which utilize ambient recording, natural language processing, and artificial intelligence may plausibly hold advantages in some of these areas.

Our study has limitations. First, a cluster randomized controlled trial might improve internal validity but might add cost and not specifically address the complexities of measuring patient–physician relationship. Second, despite signing the consent to treat document and the annotation in the after-visit summary on the use of DAX, it remains possible that patients may not have fully understood the ambient recording process. This may have confounded the patient–physician relationship, though it is reassuring that in the open label phase patients strongly agreed with the use of DAX. Third, our study is generalizable to mid-career primary care clinicians treating an upper middle class, highly educated Caucasian patient population. How technology such as DAX would affect patient satisfaction in other race/ethnic and socioeconomic groups is less clear. Fourth, the long continuity relationship among patients and providers in cohorts needs to be considered. Fifth, since we did not ascertain the method of documentation used during encounters when DAX was not utilized, we were unable to evaluate if these methods confounded the relationship between documentation methods and the PDRQ-9 results. This may have biased the results toward the null hypothesis. Additionally, how the technology would affect outcome measures in specialty clinics or with early or late career clinicians is not known. Lastly, there may have been differences in providers or patient encounter complexity that confounded the research objectives, though these would be difficult to objectively capture and adjust for.


Conclusion

We detected no difference in the patient–physician relationship with the use of ambient voice recognition, coupled with natural language processing and artificial intelligence (DAX) when evaluated on the PDRQ-9 scale. However, patients strongly agreed with the use of DAX and that it subjectively improved some measures of patient–physician relationship. Hence, it is possible a true effect was missed due to confounding by a systematic measurement instrument error given the complexity of the patient–physician relationship. Given previous literature demonstrating improved documentation efficiency and the suggestion of reduced provider burnout, as well as other potential benefits cited above, further research involving this technology is warranted.


Clinical Relevance Statement

The administrative and cognitive burden of provider documentation in the EHR has been cited as a major factor in decreased productivity and provider burnout. One novel technology that aims to reduce documentation burden is ambient voice recognition, coupled with natural language processing and artificial intelligence (DAX). As novel technology such as this is introduced into clinical care, it is important to understand the effects on the patient–physician relationship. Our study demonstrates that patients support such technological advancements, and it does not appear to detract from the patient–physician relationship.


Multiple-Choice Questions

  1. Which method for documenting an outpatient encounter has been associated with reduced patient satisfaction?

    • In-person scribe

    • Excessive focus on the computer

    • Virtual scribe

    • Ambient voice recognition

    Correct Answer: The correct answer is option b. Focusing on the computer to an excessive extent has been shown to reduced patient satisfaction. This contrasts the literature that supports documenting during the encounter “with” the patient and not focusing excessively on technology improves patient satisfaction.

  2. Measuring the patient–physician relationship is confounded by all of the following except

    • Ceiling effects in measurement instruments

    • Long-standing patient–physician relationships in primary care

    • Physician experience in navigating patient encounters

    • Health care systems' failure to recognize the importance of patient satisfaction

    Correct Answer: The correct answer is option d. Health care systems do recognize the importance of the patient of the physician relationship by measuring this and implement process to maintain strong relationships.



Conflict of Interest

L.M.O is a member of the Nuance TRAC program. He speaks for Nuance regarding the experiences of University of Michigan Health West use of Nuance technologies. There is no personal financial gain for being a member of the TRAC program. He has also attended the Executive Connect Council for Nuance.

J.J.W. is a member of the Nuance TRAC program. He speaks for Nuance regarding the experiences of University of Health West use of Nuance technologies. There is no personal financial gain for being a member of the TRAC program.

R.G. has no conflicts of interest to disclose.

J.W. has no conflicts of interest to disclose.

J.J.F. has no conflicts of interest to disclose.

Acknowledgements

The authors wish to thank the patients who gave time to answer the survey questions.

Protection of Human and Animal Subjects

The study was performed in compliance with the World Medical Association Declaration of Helsinki on Ethical Principles for Medical Research Involving Human Subjects and was approved by the University of Michigan Health West IRB.


Data Availability

The data underlying this article will be shared on reasonable request to the corresponding author.



Address for correspondence

Jeffrey J. Fletcher, MD, MSc, FAAN
Department of Research, University of Michigan Health West
5900 Byron Center Ave, Wyoming MI, 49519
United States   

Publication History

Received: 15 March 2024

Accepted: 31 May 2024

Accepted Manuscript online:
04 June 2024

Article published online:
07 August 2024

© 2024. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany


Zoom
Fig. 1 Individual PDRG-9 questions dichotomized by “DAX” use. PDRG-9, Patient–Doctor Relationship Questionnaire-9.