Key Words
acceptable noise level - audiovisual - hearing aids - speech perception
INTRODUCTION
The National Institutes of Health have estimated that 28.8 million adults in the United
States have a hearing loss that would benefit from the use of hearing aids, and for
adults aged more than 70 years, only one in three have actually tried to use them
([NIDCD, 2016]). Even among those who seek out hearing health care, hearing aid rejection is common
and difficult to predict because it is not simply related to how well the hearing
aid improves speech understanding. Rather, more often, the reason for hearing aid
rejection is how well the hearing aid performs in noisy environments. In noisy listening
conditions, hearing aids not only amplify the desired target speech but also increase
the levels of the unwanted background noise. Listeners often report having to expend
additional mental effort to keep up with the demands of communication in noisy environments,
with even greater difficulties experienced with more severe degrees of hearing loss
(see [Pichora-Fuller et al, 2016]). Therefore, even when a listener can maintain a high level of speech understanding
in a challenging listening condition, they often experience fatigue and may be unable
to allocate the necessary additional cognitive resources to compensate for their hearing
loss. Compounded over time, these difficulties lead to hearing-impaired listeners
experiencing stress, social isolation, and hearing aid dissatisfaction/rejection ([Pichora-Fuller et al, 2016]). Therefore, clinicians and scientists alike are actively looking beyond clinical
measures of speech understanding to better understand and identify individuals who
benefit from hearing aids yet may be more likely to reject them.
The concept of background noise acceptance was defined by [Nabelek et al (1991)] and led to the development of the acceptable noise level (ANL) test. The ANL test
differs from traditional speech-in-noise tests in that the ANL measures a listener’s
willingness to listen to speech in background noise instead of a listener’s speech
understanding in noise. To measure an ANL, listeners first adjust running speech to
their most comfortable listening level (MCL). Background noise is then introduced
and adjusted to the maximum acceptable background noise level (BNL) while listening
to and following the words of a story (BNL). The ANL is then calculated as the difference
between the BNL and the MCL (ANL = MCL − BNL). Smaller ANL values indicate more willingness
to accept background noise, whereas larger ANL values indicate less willingness to
accept background noise.
Studies have suggested that hearing aid acceptance is related to how well that individual
tolerates background noise regardless of improved speech understanding in background
noise ([Nabelek et al, 1991]). For example, [Nabelek et al (2006)] reported the ANL to be 85% effective at predicting hearing aid use patterns. Listeners
with small ANLs (<7 dB) were likely to become successful hearing aid users, whereas
listeners with large ANLs (>13 dB) were likely to become unsuccessful hearing aid
users. In addition, ANLs are stable and have high test–retest reliability ([Nabelek et al, 2006]; [Freyaldenhoven et al, 2006b]; [Gordon-Hickey et al, 2012]; [Brännström et al, 2014]), but they are highly variable across listeners in both normal-hearing and hearing-impaired
populations ([Nabelek et al, 1991]; [2004]; [2006]; [Freyaldenhoven et al, 2006a]; [Brännström et al, 2012]; [2013]; [2014]; [Wu et al, 2016]).
Although the ANL task has been proven to be a valuable clinical tool, much less is
known about the mechanisms underlying noise acceptance. Although noise acceptance
is measured using a task that involves listening to speech in the presence of background
noise, it does not relate to the factors that are commonly associated with understanding
speech in noise. For example, ANLs are not related to speech recognition in background
noise ([Nabelek et al, 2004]), age ([Freyaldenhoven et al, 2006b]; [Nabelek et al, 2006]), gender ([Rogers et al, 2003]; [Plyler et al, 2011]), degree of hearing loss ([Nabelek et al, 1991]; [2006]), and the use of amplification ([Nabelek et al, 2004]). Efforts to explain the underlying variability in ANL values suggested that noise
acceptance may be centrally mediated and is, thus, inherent to an individual. Findings
from these studies suggest personality traits/types ([Nabelek 2006]; [Alworth et al, 2007]; [Franklin et al, 2013]), working memory capacity ([Brännström et al, 2012]; [2014]), executive control ([Freyaldenhoven et al, 2005]; [Nichols and Gordon-Hickey, 2012]), cognitive control networks, and cortical (sensory) gating affect noise acceptance
([Miller et al, 2018]). Moreover, the afferent and efferent processes extending beyond the brain stem
may also impact noise acceptance ([Harkrider and Smith, 2005]; [Harkrider and Tampas, 2006]). Specifically, central efferent mechanisms may be weaker and/or central afferent
mechanisms are more active in listeners with large versus small ANLs ([Harkrider and Smith, 2005]; [Harkrider and Tampas, 2006]; [Shetty et al, 2014]). If noise acceptance is centrally mediated and is inherent to an individual, one
hypothesis holds that noise acceptance may not be limited to the auditory modality
but rather a global sensory phenomenon that may be observable across other modalities,
such as vision. If true, noise acceptance should act in a complimentary rather than
compensatory fashion across auditory and visual modalities.
Most of the everyday spoken communication involves the real-time integration of information
from both auditory and visual modalities. The ability to see a talker can improve
the understanding of speech in both quiet and noisy environments (e.g., [Sumby and Pollack, 1954]; [Remez 2005]). Adding congruent visual (facial) cues to a degraded auditory signal can be superadditive,
meaning that the performance level is greater than the sum of the audio and visual
signals alone (e.g., [Campbell, 2008]). However, when the visual and auditory cues are incongruent, multimodal interference
can lead to subadditive performance that increases the chance for error and adds uncertainty
which places more cognitive demands on listening (e.g., [Campbell, 2008]). Furthermore, the quality of information from the visual or auditory sources may
alter how a listener makes use of the information from each modality, demonstrating
more general aspects of perception and cognition that are not specific to the visual
or auditory system (e.g., [Witten and Knudsen, 2005]). In degraded or noisy listening conditions, people tend to rely on visual cues;
however, older adults with hearing loss may be less able to integrate auditory and
visual cues regardless of their degree of hearing loss (e.g., [Tye-Murray et al, 2007]; [Musacchia et al, 2009]). Adding visible speech cues may help listeners compensate for poor peripheral perception
or declining cognitive abilities by improving auditory perception of sentences ([Grant and Seitz, 2000]; [Smith and Fogerty, 2015]). However, the ability to integrate, or gain benefit from multimodal information,
may deteriorate with age (e.g., [Krull et al, 2013]; [Smith and Fogerty, 2016]), which may have significant implications for aging, hearing-impaired listeners.
Although the ANL test is traditionally administered using an auditory-only paradigm
(signal and noise), several studies have explored the effect of visual cues on the
auditory-ANL. Normal-hearing listeners showed improved ANL values with the addition
of visual cues (∼3 dB: [Plyler and Alworth, 2008]; ∼0.9 dB: [Wu et al, 2014]; ∼1.5 dB: [Plyler et al, 2015]). Hearing-impaired listeners, however, did not show any benefit in the unaided condition,
but similar benefit as the normal-hearing group in the aided condition with the addition
of visual cues (∼3.3 dB: [Plyler et al, 2015]). These studies confirm that the addition of visual cues can improve the ANL, particularly
for normal-hearing and hearing-impaired listeners in an aided condition.
Converging evidence supports that auditory noise acceptance is mostly unrelated to
peripheral hearing factors, centrally mediated, and related to individual factors.
Our central hypothesis is that noise acceptance is generated by high-level, top-down,
cognitive processing and is a global, complementary sensory phenomenon inherent to
an individual. Rather than focus on audiovisual integration, where visual cues influence
auditory perception and noise acceptance, we investigated the basis of perception
and noise acceptance in the auditory and visual systems separately.
The first purpose of the present study was to determine if noise acceptance is a domain-general
phenomenon by measuring noise acceptance across auditory and visual modalities. For
this study, we created a visual-ANL task that would parallel the auditory-ANL task,
but remain independent of the auditory system. We modeled the visual-ANL task after
the auditory-ANL task, using the same materials and procedures. By evaluating the
noise acceptance levels across these two modalities, we can characterize the extent
to which noise acceptance may be observable across domains. Our ability to provide
evidence that noise acceptance is a domain-general phenomenon could introduce new
opportunities to explore the underlying mechanisms of noise acceptance and help explain
the observed high individual variability in ANL values. We predicted that both the
auditory- and visual-ANLs would be highly variable across participants and that listeners
who accept higher auditory noise would also accept higher visual noise.
The second purpose of the present study was to determine if visual noise acceptance
was related to visual speech understanding. To address this question, we measured
visual sentence recognition in noise using the text reception threshold (TRT) test,
a previously established parallel test in the visual modality developed by [Zekveld et al (2007)]. The TRT is the visual analog of the speech recognition threshold in noise (SRT
in dB) task designed to better understand the variability in speech in noise performance
that could not be predicted solely by peripheral hearing factors. These two speech
understanding tasks use shared top-down cognitive and linguistic skills that are important
for processing verbal information to extract meaning from context-rich sentences that
are partially obscured either visually or auditorily (e.g., [Humes et al, 2007]; [Zekveld et al, 2007]). The TRT has been studied across multiple populations and shows consistent and
robust associations with the SRT and is considered to be a valuable complementary
measure of the overlapping cognitive skills and abilities used for understanding speech
in background noise (e.g., [George et al, 2007]; [Zekveld et al, 2007]; [2008]; [2018]; [Kramer et al, 2009]; [Besser et al, 2012]; [Humes et al, 2013]). The TRT reflects a participant’s context-bound verbal inference-making ability,
in other words how well someone can “reconstruct the whole from the sum of the parts”
([Humes et al, 2007]; [Lyxell and Rönnberg, 1987]; [Rönnberg et al, 2016]). TRTs for typically developing and hearing-impaired participants are ∼55%, ranging
49–60% (lower scores indicating better performance) (e.g., [Zekveld et al, 2007]; [2018]; [Humes et al, 2013]), and factors such as age and hearing acuity were associated with slightly poorer
performance (approximately ∼10% reduction in TRT) (e.g., [Zekveld et al, 2018]). Similar to the auditory SRT, there is a relatively small variability across listeners
and better performance is associated with greater working memory capacity and language
abilities. The relationship between TRT and SRT may be dependent on the level of semantic
context across materials, with greater associations when both sets of materials contain
similarly high levels of semantic context. [Zekveld et al (2018)] examined the underlying cognitive abilities behind the TRT in a large group of hearing-impaired
listeners (n = 200) and reported that those with better TRTs had better auditory speech
perception and that TRTs were associated with the ability to fill in missing words
in incomplete sentences, linguistic processing speed, and working memory capacity.
Parallel tasks, such as the TRT and SRT, provide information about the auditory and
cognitive deficits that affect performance within and across modalities, helping to
separate peripheral and central factors.
Because auditory-ANL values are consistently unrelated to auditory speech perception
skills, we predicted that visual noise acceptance as measured by our visual-ANL task
would not be related to visual speech understanding as measured by the TRT task. For
completeness, we also included the reception thresholds for sentences (RTS) using
the HINT test as a measure of speech understanding in noise in the auditory modality.
We predicted that the auditory and visual speech perception measures would correlate
with one another (RTS and TRT), but that neither perceptual measure would correlate
with the corresponding ANL measure.
METHODS
Participants
Thirty-seven adults between the ages of 21 and 30 years participated in this study
(27 females and 10 males). All participants reported no history of hearing or speech
disorders at the time of testing and each passed a pure-tone hearing screening test
at 20-dB HL from 250 to 8000 Hz bilaterally. All participants were informed about
the test protocol and procedures, and full, written consent was obtained before participation.
The University of Tennessee Knoxville Institutional Review Board approved the informed
consent and procedures used in this study, and all testing took place in the Hearing
Instrument Laboratory in Knoxville, Tennessee.
General Procedures
The laboratory setup used in this experiment was identical to that used in [Plyler et al (2015)]. All auditory stimuli were routed through a two-channel diagnostic audiometer (GSI-61)
calibrated to ANSI standards ([ANSI, 2010]) to an ear-level loudspeaker located in a sound-treated booth (Industrial Acoustic)
lit with fluorescent lighting that was judged to be equally lit as the outer laboratory
space. All stimuli were presented at 0° azimuth. Visual stimuli (e.g., video of the
newscaster) were routed to a 19-inch LCD flat screen computer monitor (Dell) placed
on a desktop slightly below the eye level. The monitor brightness and dynamic contrast
were set to “auto-adjust” that optimized the display settings based on the video output
and no settings were changed between participants. Participants were seated approximately
one meter from the loudspeaker and computer monitor. During the visual test conditions,
participants were free to move either closer or farther from the monitor as necessary
to best view the test materials and none of the participants requested to adjust the
monitor brightness or booth lighting. During the visual-ANL task, the video of the
newscaster was full screen, similar to a television news broadcast (see [Figure 1]). During the visual perceptual task (TRT), the participant also had access to a
computer mouse on the desk. The monitor was turned off during all auditory-only conditions.
All participants self-reported normal or corrected-to-normal vision, although a vision
screening was not administered. The hearing screening was administered first, and
the order of testing was randomized for each participant.
Figure 1 Visual-ANL task stimuli. Example of the video stimuli for the visual-ANL task with
increasing levels of noise density, from left to right 0.1–0.4 (0.9–0.6 proportion
clear). (This figure appears in color in the online version of this article.)
ANL Tasks
Auditory ANL Task
Auditory-ANL was measured using a modified version of the methodology described by
[Nabelek et al (1991)], rather than finding an individual’s MCL as a starting level; the speech level for
all participants was fixed at 65-dB SPL. Participants were asked to listen to a prerecorded
monologue of the Arizona Travelogue script (Frye Electronics, Inc., San Jose, CA)
spoken by a single male talker. Testing began with multitalker babble noise at 45-dB
SPL (+20 dB SNR), and participants self-adjusted the SNR by increasing or decreasing
the level of the background noise in 4-dB steps (2-dB steps for final adjustments).
Participants were given these instructions before the task: “You will listen to a
story with background noise of several people talking at the same time. After you
have listened to this for a few moments, select the level of the background noise
that you would be willing to accept or ‘put up with’ without becoming tense and tired
while following the story. First turn the noise up until it is too loud by pressing
the ‘up button’ on the keypad, and then down using the ‘down button’ on the keypad,
until the story becomes very clear. Finally adjust the noise (up and down) to the
level that you would put up with for a long time while following the story.” Two auditory-ANLs
were measured using this procedure and the SNRs calculated were averaged and recorded
as the participant’s auditory-ANL (in dB SNR). Lower scores on this test indicate
more background noise acceptance.
Visual ANL Task
A visual version of the ANL test was developed for use in this experiment. The video
recordings used were originally developed by [Plyler et al (2015)] and consisted of a local professional female newscaster reading the Arizona Travelogue
script (Frye Electronics, Inc.) using a teleprompter at a moderate pace. The speaker
used for this recording was Lori Tucker (WATE, Knoxville, TN), a professional newscaster
in the East Tennessee region, with an accent that is referred to as standard American
English. During the visual-ANL task, the video is played in the visual condition only;
therefore, the participants in this experiment did not hear her voice. A custom MATLAB
code was developed to add visual noise to each frame of the video using the “imnoise”
function, specifying “salt and pepper” noise to the image I, where d is the noise density (from 0 to 1). Salt and pepper noise, or impulse noise, is made
up of white, black, or both pixels over the top of an image, and was chosen because
it most closely approximated “static” on a television. Twenty noisy versions were
preprocessed with the density of the noise ranging from 0 to 1 in 0.05 steps, where
0 is clear and 1 is fully obscured. See [Figure 1] for an example of one frame of each of the video files representing four increasing
levels of noise (from left to right, 0.1–0.4, clear to noisy). The processed video
files were presented to the participants in a PowerPoint presentation where each slide
contained a new video file with increasing levels of noise. Participants manually
adjusted the noise density by advancing the slideshow, using the arrow keys to move
from one slide to another in either direction. Each time the PowerPoint slide was
changed, the video was set to autoplay at that desired noise level and continued until
the participant moved to another slide. Participants were instructed how to complete
the task verbally and provided with written instructions. Participants were given
these instructions before the task: “You will see a video with overlaid visual noise
that looks like TV static. The video is muted, so you will not be able to hear the
speaker’s voice. After you have watched for a few moments, select the level of the
static that you would be willing to accept or ‘put up with’ without becoming tense
and tired while watching the video. First increase the static by using the ‘up button’
on the keypad until it is intolerable and then decrease by using the ‘down button’
on the keypad until the picture becomes clear to you. Finally adjust the static (up
and down) to the level that you would ‘put up with’ for a long time while following
the video.” Two visual-ANLs were measured using this procedure. The final noise density
slide that the participant indicated for each trial was averaged and recorded as the
participant’s visual-BNL (noise density). To simplify the comparison between ANL values,
visual-BNLs were converted to visual-ANLs by subtracting each participant’s BNL from
1; therefore, lower values indicate more background noise acceptance on both tasks.
Perception in Noise Measures
Auditory Reception Threshold for Speech in Noise (RTS)
Auditory reception thresholds for sentences (RTS) in background noise were assessed
using the HINT ([Nilsson et al, 1994]). The HINT procedure used in this study was modified from the original HINT protocol,
with speech presentation levels fixed and noise levels varied adaptively, to better
match the adaptive protocol used for the measurement of ANL (e.g., [Plyler et al, 2008]). Sentences and speech-shaped noise were presented from the same loudspeaker at
0°. Throughout the test, speech was held constant (65-dB SPL) and the level of the
noise was varied to determine a final threshold using a one-up/one-down adaptive procedure
that tracks the SNR required to achieve approximately 50% correct on the test (SNR-50).
Participants were asked to repeat each sentence they heard; only sentences in which
all words were correctly repeated were scored as correct. The initial SNR for all
trials was –5-dB SNR (noise 70 dB). The SNR was varied with a step size of 4 dB for
the first four sentences with no sentences repeated and a step size of 2 dB for the
remaining six sentences. The SNR threshold for each list was calculated by averaging
seven SNRs, the final six sentences and the SNR dictated by the response of the final
sentence (i.e., the SNR for the 11th trial). Two lists were administered using this procedure, and the SNRs calculated
were averaged and recorded as the participant’s reception threshold for sentences
(RTS: HINT threshold in dB SNR). Lower SNRs on this test indicate better performance.
Visual Text Reception Threshold (TRT)
Visual speech perception in visual background noise was assessed using an adaptive
text reception threshold (TRT) task, which assessed a participant’s ability to recognize
visually degraded sentences (vertical black bars obscuring portions of the text).
This test is generally considered the visual analog of the auditory speech recognition
in noise test ([Kramer et al, 2009]). The version of the TRT used in this study was adapted from the Dutch version developed
by [Zekveld et al (2007)] using the revised speech perception in noise test (R-SPIN; [Bilger et al, 1984]) sentences ([Humes et al, 2013]). The TRT task used in this experiment was an automated software program provided
by Larry Humes and Gary Kidd, Indiana University, Bloomington, IN 47405. During this
task, meaningful sentences were presented within a large text box centered on a computer
monitor; the text is presented in red with a white background with a text size of
45. The words appeared sequentially (250 msec/word) and the complete sentence remained
on the screen for 3.5 sec. Participants were asked to read aloud each sentence as
best as they could; only sentences in which all words were correctly repeated were
scored as correct. During the task, sentences were partially obscured by equally spaced
vertical black bars and the difficulty of the task varied adaptively based on the
participant’s performance by increasing or decreasing the width of the bars (i.e.,
proportion of unobscured text). See [Figure 2] for an example of a sentence with varying proportions of obscured text. All initial
trials began with 0.84 (bar width), and the same sentence was displayed with decreasing
levels of masking, in steps of 0.12, until the participant was able to correctly repeat
the sentence. Subsequent sentences were varied in steps of 0.06 using a one-up/one-down
adaptive procedure that tracked the proportion of masking required to read approximately
50% of the sentences correctly (TRT). The test consisted of four adaptive runs of
13 trials using four sets of R-SPIN high-predictability sentences. The threshold for
each run was calculated as the mean proportion masking for trials 5–14. Similar to
the RTS described previously, the 14th trial was not presented but the value based on the response of the 13th trial. Threshold estimates for the four runs were averaged and recorded. To simplify
the comparison between TRT and RTS, TRTs were converted to proportion of unobscured
text by subtracting each participant’s final threshold from 1; therefore, lower values
indicate more background noise acceptance on both tasks.
Figure 2 Visual text reception threshold stimuli. Example of the TRT sentence “hold the baby
on your lap” with increasing levels of masking, from top to bottom 0.3–0.6 (0.7–0.4
proportion of unobscured text). For reference, the mean TRT threshold for these participants
was 0.6, and scores ranged from 0.52 to 0.76 proportion of unobscured text. (This
figure appears in color in the online version of this article.)
Statistical Analyses
All statistical analyses were calculated using SPSS statistical package version 23
(SPSS Inc., Chicago, IL). To evaluate the relations between and among the ANL tasks
and perceptual tasks, we examined the following Pearson product–moment correlation
coefficients to assess the following:
-
The correlation between ANL tasks: auditory-ANL and visual-ANL
-
The correlation between perceptual measures: RTS and TRT
-
The interrelations among the ANL and perceptual measures.
RESULTS
Mean thresholds, standard deviation, and range values for auditory-ANL, visual-ANL,
and RTS for 37 participants and TRT for 33 participants are displayed in [Table 1]. Because of a software malfunction, TRT data were not saved for four participants.
Therefore, all analyses were computed excluding these participants list-wise, including
only the results for the 33 participants with values for all measures. It should be
noted, however, that all of the values on the other tests (auditory-ANL, visual-ANL,
and RTS) for the excluded participants fell within the range of levels and thresholds
measured for the remaining participants, and including/excluding these data did not
change the strength of the statistical relationships reported. Pearson product–moment
correlation coefficients (r values) were calculated using SPSS to evaluate the relations between and among the
ANL and perceptual tasks.
Table 1
Sample Size, Mean, Standard Deviation, and Range Values for ANL and Perception in
Noise Tasks
Task
|
n
|
M
|
SD
|
Range
|
ANL
|
|
|
|
|
Auditory-ANL (dB SNR)
|
37
|
3.70
|
7.27
|
−9 to 27
|
Visual-ANL (proportion clear)
|
37
|
0.69
|
0.15
|
0.2 to 0.9
|
Perception in noise
|
|
|
|
|
Auditory RTS (dB SNR)
|
37
|
−1.80
|
1.45
|
−4.3 to 1.2
|
Visual TRT (proportion unobscured text)
|
33
|
0.61
|
0.05
|
0.52 to 0.76
|
Note: Because of a software malfunction, TRT scores were not recorded for four participants.
Within-session Reliability
Intraclass correlation coefficients (ICC) were calculated to assess the within-session
reliability of the two ANL measures. ICCs were calculated between the first and second
measurement of the auditory- and visual-ANL tasks. ICC estimates and their 95% confident
intervals were calculated based on a mean rating (k = 2), absolute agreement, two-way mixed effects model and interpreted as poor (ICC
< 0.50), moderate (ICC = 0.50–0.75), good (ICC = 0.75–0.90), and excellent (ICC >
0.90) ([Koo and Li, 2016]). Because two runs of the ANL tasks were completed and averaged to calculate the
final level, the averaged measure ICC was interpreted. For the auditory-ANL, the ICC
= 0.972, with 95% confidence intervals 0.946–0.985; for the visual-ANL, the ICC =
0.970 with 95% confidence intervals 0.91–0.99. These results indicate excellent within-session
test–retest reliability for both the auditory- and visual-ANL tasks.
Relationship between Auditory-ANL and Visual-ANL
Results of the Pearson correlation indicated a significant positive correlation between
auditory-ANL and visual-ANL measures (r = 0.481, p = 0.005, n = 33). Overall, there was a strong, positive correlation between the two
measures of noise acceptance, whereby listeners who were willing to accept more auditory
noise were also willing to accept more visual noise. The “goodness of fit” of the
correlation model was evaluated by calculating the coefficient of determination (r
2 = 0.231) which suggest that although 23% of the variance can be accounted for by
each of the ANL measures, a large amount of variance remained unexplained. In other
words, although a proportion of the variance in auditory-ANL and visual-ANL can be
explained by one or more shared factors, such as a central amodal noise acceptance
factor, the fact that these two measures do not directly predict each other perfectly
means that other unrelated (auditory or nonauditory) factors are also contributing
to the variance in ANLs. The scatterplot in [Figure 3] displays the relationship between auditory-ANL and visual-ANL with lower values
indicating more background noise acceptance on both tasks.
Figure 3 Relationship between ANL tasks. Auditory-ANL (dB SNR) is displayed along the ordinate,
and visual-ANL (proportion clear) is displayed along the abscissa. The solid line
represents the best fit linear regression line; r and p values are indicated in the upper left corner of the figure. Lower thresholds indicate
better performance in both ANL tasks.
Relationship between RTS and TRT
Results of the Pearson correlation indicated a significant positive correlation between
reception threshold for sentences (RTS) and text reception thresholds (TRTs) (r = 0.509, p = 0.002, n = 33). As expected, the ability to recognize sentences spoken in background
noise was strongly related to the ability to read sentences that were partially obscured.
As calculated for the auditory- and visual-ANL previously, the “goodness of fit” of
the correlation model was evaluated by calculating the coefficient of determination
(r
2 = 0.259) which suggests that although 26% of the variance can be accounted for by
each of the perceptual measures, a large amount of variance remained unexplained.
Similar to the ANL measures, although a proportion of the variance in RTS and TRT
are mostly likely shared by a common “speech understanding” factor, other unrelated
factors are also contributing to the variance observed. The scatterplot in [Figure 4] displays the relationship between RTS and TRT; lower values indicate better performance
on both tasks.
Figure 4 Relationship between perception in noise tests. Auditory reception threshold for
sentences (dB SNR) is displayed on the ordinate and visual text reception threshold
(proportion unobscured text) on the abscissa. The solid line represents the best fit
linear regression line; r and p values are indicated in the upper left corner of the figure. Lower thresholds indicate
better performance in the RTS and TRT tests.
Relations among ANL and Perception in Noise Measures
No significant correlations were found between auditory-ANL and RTS or visual-ANL
and TRT (A-ANL and RTS: r = 0.100, p = 0.580; V-ANL and TRT: r = −0.040, p = 0.823). These findings are consistent with previous studies that have demonstrated
that noise acceptance is generally unrelated to speech understanding in noise (e.g.,
[Nabelek et al, 2004]). These data further support the notion that a person’s willingness to accept noise
is not dependent on how well they understand speech in background noise and that each
of these measures likely reflect different perceptual dimensions when listening to
speech or watching a speaker in the presence of auditory or visual noise.
DISCUSSION
Relationship between Auditory-ANL and Visual-ANL
The primary purpose of this study was to determine if noise acceptance was related
across the auditory and visual modalities to examine our hypothesis that noise acceptance
is a domain-general phenomenon. We administered the traditional auditory-ANL task
and the visual analog of the auditory-ANL task that we developed for this study to
young normal-hearing participants. Using video recordings of the same materials used
in the auditory-ANL task, mixed with varying levels of “static” in an adaptive task,
we were able to measure each participant’s willingness to accept visual noise. Similar
to auditory-ANLs, there was a wide range of individual variability in visual-ANLs.
Importantly, we observed a strong relationship between the auditory- and visual-ANLs,
whereby listeners who accepted higher auditory noise were also those who accepted
more visual noise.
Noise acceptance measures were positively correlated across sensory modalities, thereby
suggesting that these two tasks draw on a shared general perceptual or cognitive mechanism
that is not specific to the auditory or visual modality. For example, previous research
has suggested that central efferent mechanisms may be weaker and/or central afferent
mechanisms are more active in listeners with large versus small auditory-ANLs ([Harkrider and Smith, 2005]; [Harkrider and Tampas, 2006]; [Shetty et al, 2014]) and that cortical sensory gating of inhibition may contribute to the observed variance
in ANL ([Miller et al, 2018]). Given the positive relationship observed between the auditory and visual noise
acceptance measures, it is possible that similar higher order efferent/afferent relationships
exist in the visual system; however, this warrants further investigation. Furthermore,
a large proportion of the variance remained unexplained, meaning that many other factors
are contributing to the variance observed in each of the ANLs.
Relationship between RTS and TRT
A second goal of this study was to evaluate the perceptual abilities across modalities
by comparing auditory and visual sentence recognition in noise. We found a significant
positive correlation between the reception threshold for sentences (RTS in dB SNR)
using HINT sentences in background noise and text reception thresholds (TRTs) using
SPIN sentences partially obscured by a bar pattern. Our findings were consistent with
prior studies that have demonstrated consistently that the ability to read masked
written text was associated with the recognition of speech in background noise ([George et al, 2007]; [Humes et al, 2007]; [2013]; [Zekveld et al, 2007]; [2008]; [2018]; [Kramer et al, 2009]). Our results are in line with prior studies showing significant associations between
TRT and SRT, and TRT may be able to predict approximately 10–30% of the variance in
speech understanding in noise ([George et al, 2007]; [Besser et al, 2012]; [Humes et al, 2013]; [Zekveld et al, 2018]) with the shared variance reflecting a common “wholes from parts” factor that reflects
general peripheral or cognitive contributions to speech understanding.
Relations among ANL and Perception in Noise Measures
Finally, we evaluated the interrelations among the ANL measures and perception of
speech in noise. As we expected, auditory-ANLs were not related to speech understanding
in noise (RTS), and these findings are consistent with numerous studies that have
demonstrated that an individual’s noise acceptance levels are consistently unrelated
to speech perception abilities (e.g., [Nabelek et al, 2004]; [2006]; [Mueller et al, 2006]; [Plyler et al, 2008]). Furthermore, as predicted, visual-ANLs were similarly unrelated to visual speech
understanding in visual noise (TRT). These results provide converging evidence that
noise acceptance and speech recognition tasks reflect different aspects of auditory
and visual perception and that the levels of noise that an individual reports that
they will accept is not dependent on their perceptual abilities in either sensory
modality. Individuals who can achieve high levels of speech understanding in more
challenging SNRs do not consistently report higher levels of noise acceptance. Also,
consistent with the ANL literature, we observed large variability in noise acceptance
levels for both the auditory- and the visual-ANL tasks (see [Table 1] for reference), whereas the range of performance for the auditory and visual perceptual
tasks were small. For comparison, [Nabelek et al (2006)] reported a range of 30 dB in ANL (2–27 dB) and we show a slightly higher range of
38 dB (−9 to 27 dB) The difference in the variability across tests is simply due to
the constraints of the measurement used; although this study demonstrated that young
normal-hearing adults are consistent in their self-reporting of the noise level they
are willing to accept in both visual and auditory modalities, it is still not clear
why some listeners are more willing to accept high levels of background noise.
Limitations
Listening for a target in background noise depends on a listener’s ability to sustain
attention and suppress or “tune out” irrelevant information, which likely reflects
cognitive inhibition and is a skill that is not limited to the auditory modality.
All listeners must integrate visual and auditory sensory input during everyday communication,
and it seems natural to consider a multisensory evaluation of hearing impaired listeners.
There is a body of literature that has suggested that visual tests should be included
when assessing hearing-impaired listeners to help differentiate the effects of peripheral
hearing loss and global cognitive processing deficits that can be observed across
modalities ([Humes et al, 1992]; [2007]; [2013]; [McFarland and Cacae, 1995]; [Watson et al, 1996]; [Grant et al, 1998]; [Humes 2005]; [Zekveld et al, 2007]; [2018]; [Kramer et al, 2009]).
The results of this study provide support for the hypothesis that noise acceptance
may be amodal; however, our study only included young normal-hearing listeners and,
therefore, motivates future research with heterogeneous populations of hearing-impaired
listeners. We did not evaluate vision or lipreading ability in our normal-hearing
listeners, which may be more relevant for older, hearing-impaired users who may have
a more limited capacity for integrating auditory and visual cues (e.g., [Tye-Murray et al, 2007]; [Musacchia et al, 2009]). We also did not evaluate how audiovisual stimuli may interact in the visual-ANL
task. Previous work demonstrated that adding visual cues to the auditory-ANL improved
the level of noise a listener would accept, and the combined auditory- and visual-ANL
may increase the ability to predict who would benefit from hearing aids (e.g., [Plyler et al, 2015]). Because the video stimuli in this task were presented without sound, we also do
not know whether presenting congruent audiovisual stimuli would change how much visual
noise a listener is willing to accept.
Clinical Applications and Future Directions
The long-term goal of this research is to better understand the mechanisms that underlie
noise acceptance, which may ultimately lead to the development of better tools that
clinicians can use with their patients to improve hearing aid use. Hearing-impaired
listeners who accept higher levels of noise are much more likely to use their hearing
aids and reap the benefits that hearing aids provide. Therefore, identifying the basis
for noise acceptance could lead to improved outcomes with amplification for people
with hearing loss who could benefit from hearing aids but do not pursue or use them.
Although the observed relationship between auditory- and visual-ANL is promising,
future studies will include behavioral and electrophysiologic measures of inhibition
and executive control to identify and characterize which other perceptual or cognitive
factors may be contributing to the shared and unique variance of auditory and visual
noise acceptance. Although the results of this study demonstrate high within-session
reliability, important next steps include evaluating between-session test–retest reliability
and comparing auditory- and visual-ANLs in a more diverse sample of normal-hearing
and hearing-impaired users. So far, visual-ANL has only been studied in young normal-hearing
listeners; therefore, more work is needed to determine if a multimodal approach to
noise acceptance could provide new avenues to identifying and isolating the underlying
mechanism that could ultimately lead to the development of novel methods to measure
and improve noise acceptance. By learning that noise acceptance acts in a complementary
and not a competitive way across modalities, we could use this knowledge to direct
intervention strategies to improve noise acceptance in either modality.
Abbreviations
ANL:
acceptable noise level
BNL:
background noise level
HINT:
hearing in noise test
ICC:
intraclass correlation coefficient
MCL:
most comfortable listening level
RTS:
reception threshold for sentences
SNR:
signal-to-noise ratio
SRT:
speech recognition threshold
TRT:
text reception threshold