Semin Hear 2022; 43(03): 137-148
DOI: 10.1055/s-0042-1756160
Review Article

Auditory Evoked Potentials in Communication Disorders: An Overview of Past, Present, and Future

Akshay R. Maggu
1   Department of Speech-Language-Hearing Sciences, Hofstra University, Hempstead, New York
› Author Affiliations
 

Abstract

This article provides a brief overview of auditory evoked potentials (AEPs) and their application in the areas of research and clinics within the field of communication disorders. The article begins with providing a historical perspective within the context of the key scientific developments that led to the emergence of numerous types of AEPs. Furthermore, the article discusses the different AEP techniques in the light of their feasibility in clinics. As AEPs, because of their versatility, find their use across disciplines, this article also discusses some of the research questions that are currently being addressed using AEP techniques in the field of communication disorders and beyond. At the end, this article summarizes the shortcomings of the existing AEP techniques and provides a general perspective toward the future directions. The article is aimed at a broad readership including (but not limited to) students, clinicians, and researchers. Overall, this article may act as a brief primer for the new AEP users, and as an overview of the progress in the field of AEPs along with future directions, for those who already use AEPs on a routine basis.


Auditory evoked potentials (AEPs) in humans, as the name suggests, refer to “potentials,” that is, voltages which are “evoked” via set of electrodes (predominantly from the scalp) in response to the “auditory” stimulation. Following the breakthrough discovery of α rhythms in electroencephalography (EEG),[1] with the advent of technology in the past half century, research in the area of AEPs has spurred and found its usage in the fields of (but not limited to) linguistics, psychology, neuroscience, and communication disorders. The current review (in brief) is aimed at summarizing the historical viewpoints; current clinical implications of AEPs in the field of communication disorders; AEP research investigating the questions related to speech, language, and hearing; and the contemporary AEP research that exhibits potential for use in the clinics. The text in this article may serve as a starting point for those who are beginning their journey in the field of AEPs as students, clinicians, and/or researchers, and as a snapshot on the progress of AEP field in general, for those who already possess some knowledge about AEPs.

HISTORICAL PERSPECTIVES: CLASSIFICATIONS OF AEPs

Since the discovery of AEPs, researchers have proposed several ways to classify AEPs. One of the first classifications was based on the latency of the replicable waves following the stimulus onset. AEPs were then primarily classified as early, middle, and late.[2] AEP responses that were elicited within 10 ms from the stimulus onset were classified as early responses (e.g., auditory brainstem response [ABR]; [Fig. 1A]), 10 to 50 ms were classified as middle-latency response (MLR; [Fig. 1B]), and 60 to 500 ms were classified as slow- and long-latency response (LLR; [Fig. 1C]).[3] This classification is still popular in the scientific and clinical fields. Another way to classify AEPs was based on their recording sites. While most of AEPs were recorded using vertex and mastoid or neck electrodes, certain AEPs were recorded using alternate sites such as ear canal, as in the case of electrocochleography (ECochG). AEPs have also been classified as exogenous versus endogenous potentials. Exogenous potentials are mainly those that are more sensory in nature, not dependent on the subject's level of consciousness, and are not influenced by the higher-order linguistic and cognitive processes (e.g., click-evoked ABR). On the other hand, endogenous potentials are those that are affected by the subject's level of consciousness, and are influenced by the higher-order linguistic and cognitive processes (e.g., P300).

Zoom
Figure 1 Human scalp–recorded auditory evoked potentials representing the different levels of the auditory nervous system: (A) auditory brainstem responses to 100 µs click stimuli collecting at intensities ranging from 90 to 10 dB in 10 dB steps at 30.1/s repetition rate. Waves I, II, III, and V are clearly visible in this individual's data. A decrease in amplitude and increase in latency of the peaks can be observed as the intensity for presentation decreases; (B) middle latency response to 500 Hz tone burst stimuli collected at 7.1/s repetition rate at 70 dB. Na, Pa, and Nb peaks in the latency range of 15 to 40 ms are clearly visible; and (C) long latency response to 500 Hz tone burst stimuli collected at 1.1/s repetition rate at 70 dB. P1, N1, P2, and N2 peaks in the latency range of 60 to 200 ms clearly visible.

While the online continuous EEG was discovered already by late 1920s, one of the key issues that the scientists faced was the presence of background noise that continued to obscure the desired EEG responses.[3] It was only after the discovery of the averaging technique to enhance the signal-to-noise ratio (SNR) of the desired EEG responses[4] that there was a surge of research studies in the field of AEPs. Following the improvement in SNR, changes in the high-pass filter settings from 50 to 150 Hz led to visualization of clear MLR waveforms with the nomenclature of the three negative and two positive peaks as No–Po–Na–Pa–Nb (see [Fig. 1B]), with generators in the thalamocortical pathways.[5] [6] [7] Around this time, there was also interest in the late AEPs which led to the discovery of P1–N1–P2–N2 waves of the LLR (see [Fig. 1C]), with generators predominantly in the cortex.[8] [9] These waves were found to be affected by attention and sleep.

While much of the work in AEPs at this time was being done using scalp electrodes, researchers from different parts of the world, independently of one another, were trying to obtain recordings from the then nonconventional sites, that is, external auditory canal,[10] cochlear promontory,[11] and earlobe.[12] They were all able to record the waves of the cochlear nerve potential that reduced in amplitude and increased in latency with reduction in stimulus intensity. Around this time, it was reported that there were some “late” waves in the recordings of the cochlear nerve potential which were too “early” to be considered to be a part of the MLR or LLR.[12] These waves were described by Jewett and colleagues[13] [14] which was later to become what we now know as the “ABR.” Soon after, studies were conducted to confirm the excellent reliability of these “Jewett bumps” which led to the beginning of widespread use of the ABR across the disciplines of audiology,[15] neurology,[16] and psychology,[17] for understanding the sound processing in the subcortical auditory nervous system. Following the work on the ABR, it was found that responses from the auditory brainstem could also replicate the acoustic waveform of the low-frequency tone-pip stimuli, which came to be known as the “frequency following response” (FFR).[18] Furthermore, studies investigating the effects of attention on the late AEPs led to the discovery of mismatch negativity (MMN) and P300 in an oddball paradigm.[19] While there have been numerous events throughout the scientific history that have placed AEP subfield where it presently stands, and discussing all the events is beyond the scope of this article, this section has tried to capture some of the key milestones in AEP history.


AUDITORY EVOKED POTENTIALS IN CLINICS

Use of Auditory Evoked Potentials in Diagnostics

As clinics are usually very busy catering to patients with a variety of speech, language, and hearing disorders, time is of essence when it comes to testing. A measure that is fast, reliable, and replicable is usually the one to succeed in the clinics. In audiology, the click-evoked ABR has met these expectations and is thus frequently used in clinics for the purpose of threshold estimation and anatomical site-of-lesion testing. The click-evoked ABR, when recorded at a sufficiently high intensity level (e.g., 90 dB), elicits five major peaks (I, II, III, IV, and V) that are approximately 1 ms apart from each other. Broadly, wave I originates from the auditory nerve, wave II from the cochlear nucleus, wave III from the superior olivary colliculi, wave IV from the lateral lemniscus, and wave V from the inferior colliculi.[20] [21] Waves I, III, and V have been found to be most reliable and replicable out of the five major waves of the ABR. The ABR waves, with a decrease in intensity of click stimulation, increase in latency and decrease in amplitude ([Fig. 1A]).[22] These properties of the ABR led to setting up of normative latency-intensity and amplitude-intensity functions that quickly became a part of the ABR analysis. These functions, especially those pertaining to the wave V—the most dominant wave of the ABR—are sensitive to hearing difficulties.[23] While the wave V amplitude and latency are widely used in screening and differential diagnosis of hearing loss, more complex measures such as interpeak latency ratios between waves I and III, III and V, and I and V; amplitude ratio between V and I; and the interaural wave V amplitude ratio are routinely used for investigating the site of lesion in the subcortical auditory system.[24] For example, an increased latency difference between the waves I and III with near-normative latency difference between waves III and V may be indicative of lesion at the lower brainstem and should be followed up by further neurological evaluation.

Over the years, different ABR protocols have been developed to identify lesions in the subcortical auditory nervous system. One of the shortcomings of using click stimulus in recording the ABR is that the clicks predominantly stimulate the high-frequency regions of the basilar membrane. As a result, an ABR elicited with clicks represents the neural processing of high frequencies while excluding the lower frequencies. To obtain an ABR that is a representative of a broad range of frequencies, “stacked ABR” was developed. Stacked ABR testing involves recording ABRs at different frequency bands (e.g., 500 and 8,000 Hz) via presenting clicks in conjunction with relevant high-pass masking noise (e.g., 500 Hz high-pass, 8,000 Hz high-pass). By doing this, separate ABRs are obtained for specific frequency bands corresponding to 500, 1,000, 2,000, 4,000, and 8,000 Hz. These separate frequency-specific ABRs are then “stacked” and added together to get a sum of the synchronous activity of the neurons responsible for encoding these broad range of frequencies. Stacked ABR, due to its fine-grained testing, has been found to be sensitive in detecting small auditory nerve tumors.[25]

Furthermore, a differentiation between the click-ABRs collected with a slower repetition rate (e.g., 11.1/s) and a faster repetition rate (e.g., 90.1/s) has been found to be sensitive in detecting auditory neuropathy spectrum disorder (ANSD).[26] The rationale behind this approach is that the synchrony of firing of auditory nerve fibers is challenged with a faster auditory stimulation than with a slower stimulation rate. As a result, a patient with ANSD who exhibits asynchronous firing will have vastly different ABRs for low versus high repetition rates as compared with a person with typical auditory nerve synchrony.

Although the traditional click-ABR is an excellent clinical tool to understand the basic sensory functioning of the auditory nervous system, it is not capable of evaluating the speech sound processing in the auditory system. To fill this clinical gap, FFR elicited with a 40-ms /da/ stimulus at a relatively high repetition rate (e.g., 10.9/s), also known as BioMARK (Biological Marker for Auditory Processing), was developed.[27] BioMARK, and FFRs in general, have been found to be sensitive to a variety of disorders ranging from dyslexia,[28] [29] autism,[30] [31] auditory processing disorder (APD),[32] [33] [34] [35] and concussion[36] and exhibit potential for use in the clinics.

Another early-latency AEP technique that is frequently used in the clinics is ECochG. ECochG technique can be used to measure the cochlear microphonics (CM), summating potential (SP), and the compound action potential (AP), either noninvasively using electrodes on the scalp and in the ear canal or semi-invasively by placing an electrode on the tympanic membrane. While CM is predominantly generated at the outer hair cells[37] and SP is a result of contributions from both outer and inner hair cells,[38] AP has its generators in the auditory nerve.[39] By using ratio of the SP/AP amplitude, ECochG is routinely used in the diagnosis of Meniere's disease.[40] Furthermore, the presence of a long-ringing CM (i.e., extended latency range) has been reported to be indicative of ANSD.[37]


Use of Auditory Evoked Potentials in Intervention

AEPs are useful not only in the diagnosis of communication disorders but also in their intervention. Electrical compound action potential (ECAP) and the electrical ABR (EABR) are regularly used in the cochlear implant clinics. ECAP measures the compound AP, an indicator of sufficient auditory nerve functioning, using the stimulation of the auditory nerve with electrical impulses via the cochlear implant.[41] EABR measures the auditory brainstem functioning in response to the electrical impulses.[42] Both ECAP and EABR can be measured intra- and postoperatively to understand the functioning and changes following usage of cochlear implant. Similarly, to understand the success of a hearing aid fitting over time, auditory steady-state response has been found to be useful.[43]

Along with the early-latency AEPs, evidence suggests that the cortical AEPs including MLR,[44] [45] [46] [47] LLR,[48] [49] [50] [51] MMN,[52] [53] [54] [55] and P300[56] can be promising in the diagnosis and intervention of communication disorders. However, the factors such as individual variability, time-demand, and need for specialized equipment and training make these techniques less appealing, at least for now, for use in the clinics.



AUDITORY EVOKED POTENTIALS IN RESEARCH

AEPs, due to their versatile characteristics, have been popularly used across the disciplines of speech, language, and hearing to address a variety of research questions.

Use of Auditory Evoked Potentials in Audiology Research

In the past decade, there has been a growing interest in investigating the ABR for establishing the biomarkers for cochlear synaptopathy, which is hypothesized to be one of the leading causes of “hidden hearing loss.”[57] Animal studies suggest that the wave I amplitude of the ABR is diminished in the cases of cochlear synaptopathy,[58] mainly due to damage to low spontaneous rate, high threshold auditory nerve fibers.[59] [60] However, in humans, the reliability of the ABR wave I from a scalp-recorded ABR is questionable, mainly due to high variability and individual differences in the ABR wave I amplitude in humans, as a result of which the ABR wave I has not been established as a definitive neural marker for “hidden hearing loss” in humans.[61] [62] [63] [64] [65] To circumvent the variability issue, attempts have been made to use the ratio of SP and wave I amplitude as a way to identify “hidden hearing loss.”[66] The rationale behind this approach was that the SP which is a cochlear potential will be unaffected in hidden hearing loss, while the wave I amplitude will be affected in the cases of hidden hearing loss, and thus the SP will act as a normalization factor for the wave I amplitude. However, SP amplitude also faces similar problems of high interindividual variability, potentially due to its low magnitude[67] resulting in low SNR. Nevertheless, attempts in finding assays for identifying hidden hearing loss in humans are ongoing and hold promise for the future.

In regard to intervention-based research, MLRs and LLRs have been found to be useful in evaluating cortical plasticity as a result of auditory training paradigms.[47] [68] In cochlear implant research, auditory neuroplasticity as a result of using cochlear implants has been investigated using the late AEPs (e.g., LLR, MMN). Overall, research findings reveal changes in the P1, N1, and P2 components of the LLR and improved detection of frequency contrasts using MMN following the use of cochlear implant. However, one of the most challenging tasks in CI-based AEP research is to eliminate the CI-induced electrical artifact in AEP recordings. While there have been recent attempts in developing techniques that could aid in removing the CI-induced artifacts,[69] [70] [71] more research is needed to bring the CI-based EEG in the mainstream research and clinics. While the CI-induced electrical artifact is a problem for the scalp-recorded acoustical stimuli-evoked potentials, electrical stimuli-evoked potentials do not usually present themselves with such a problem. For example, EABR has been used as an index of neuroplasticity in the auditory nervous system following the use of cochlear implants.[72] [73] [74]


Use of AEPs in Linguistics and Cognitive Neuroscience Research

Alongside their use in the area of hearing research, AEPs have been immensely useful in the research related to speech and language perception. The click-ABRs have been found to be predictive of speech and language development in children.[75] Speech-evoked FFRs have been used to investigate the experience-dependent effects of auditory experiences including musical training, bilingualism, socioeconomic status, training, language–music relationship, absolute pitch, on the brain.[76] [77] [78] [79] In general, evidence suggests that auditory experiences enhance the neural encoding of sounds, as depicted on the FFR.[76] [77] [79] [80] As the FFR is known to excellently recapitulate the acoustics of the stimulus (e.g., fundamental frequency [F0]) and is influenced by language experience, it has been used to study the processing of tone languages (e.g., Mandarin, Cantonese) in the auditory nervous system.[76] [77] [78] [79] For example, [Fig. 2] depicts a comparison of the Cantonese Tone 2 stimulus ([Fig. 2A, C]) and the corresponding FFR ([Fig. 2B, D]), and a pitch-tracking comparison of the FFR and the stimulus pitch ([Fig. 2E]), where the participant was a native speaker of Cantonese. It is worth noting how closely the pitch of the FFR tracks the pitch of the stimulus, making FFR an excellent candidate for studying the neural processing of lexical tones. In the studies pertaining to tone language processing, FFR has been utilized to investigate the linguistic sound change,[78] interactive effects of tone language and musical experience,[76] and additive effects of absolute pitch and tone language experience.[79] Furthermore, FFR has been found to be predictive of acquisition of tone languages.[81] The long-latency counterpart of the FFR known as the “cortical pitch response” that entails peaks and troughs in the latency range of 600 to 900 ms has also been found to be sensitive to tone language experience.[82] [83] [84]

Zoom
Figure 2 Frequency following response collected with a rising lexical tone stimulus (/ji/ T2) from Cantonese. (A) Waveform of the 175-ms stimulus. (B) Waveform of the frequency following response (FFR) with 50 ms pre-stimulus baseline followed by the FFR followed by the post-stimulus baseline; (C) power spectral density of the stimulus; (D) power spectral density of the FFR; and (E) comparison of the pitch contours of the FFR and the stimulus. In this case, the FFR pitch contour has near-perfect resemblance to the stimulus pitch contour.

Other cortical AEPs, that are recorded via oddball presentation of stimuli (e.g., MMN and P300), have been very popular in examining speech sound processing. For example, MMN and P300 have been widely used to understand native versus nonnative speech discrimination,[85] and categorical perception.[86] [Fig. 3] depicts P300 ([Fig. 3A]) and MMN ([Fig. 3B]) collected via an 80:20 (standard: deviant) oddball presentation. In this example ([Fig. 3B]), MMN is represented by the shaded region of the waveform in the latency range of 100 to 300 ms. While FFR, MMN, and P300 entail presentation of very short stimuli such as monosyllables, very late AEPs such as N400 and P600 make use of sentence-level stimuli. N400 is used for examining the semantics of a sentence.[87] A semantically incongruent sentence leads to a slow negative wave predominantly ranging from 300 to 700 ms. For example, “Peter eats bread and butter” will not elicit a N400 but “Peter eats bread and shoe” will elicit a N400 because while the former is a semantically congruent sentence, the latter is a semantically incongruent sentence. In comparison, P600, which is a very late positive wave in the latency range of 500 to 1,000 ms, elicited as a result of syntactic violations in sentences.[88] For example, violations in subject–verb agreement (e.g., “The boy *throw the ball”) may result in a P600 component.

Zoom
Figure 3 Auditory evoked responses elicited in 80:20 oddball paradigms: (A) P300 (labeled) in the latency range of 300 to 400 ms and (B) mismatch negativity (MMN) (shaded) in the latency range of 100 to 300 ms.


CONTEMPORARY RESEARCH AND FUTURE DIRECTIONS IN AEPs

Quite recently, there have been attempts to use AEPs to resolve some of the longstanding issues in the subfield of APD. APD is arguably one of the most intriguing and controversial topics in the field of audiology. The main issues that make APD controversial are the lack of domain specificity to the auditory domain and comorbidity with nonauditory disorders (e.g., developmental language delay).[89] [90] [91] [92] Recently, it has been argued that these issues might be a result of the use of existing behavioral test batteries for APD that require clients' attention, memory, and/or linguistic skills, and, thus, are confounded by the domains of language and cognition.[93] In other words, a client with a reduced attention span may fail on the current APD test batteries. To circumvent these shortcomings of the existing behavioral test batteries, there has been a proposal for setting up an objective test battery that contains AEPs targeting the subcortical auditory nervous system. AEP testing in this objective test battery does not require active participation from the client and, thus, limits the influences of other domains (of language and cognition) on the auditory processing testing.[93] However, more research is needed to understand the relationship of the proposed test battery and the auditory behavior.

Traditional AEP testing, though popular in both clinics and research, mostly due to its excellent reliability, entails repeated presentation of stimuli to elicit neural responses that could be averaged together to obtain meaningful, interpretable waveforms. While this approach is appealing and has endured the test of time, there are also a few demerits of this approach. First, this methodology may limit the variety of auditory stimuli that could be used for conducting research. As repeated presentation of stimuli becomes a prerequisite for this technique, it imposes a limitation on the nature (type and length) of stimuli that could be presented. For example, if subcortical representation of sentences (of several seconds in duration) is needed to be examined using this technique, there may be 2,000 presentation of sentences needed. An obvious problem with that is the time taken during the whole process which might further degrade the data quality due to subject-related factors (e.g., fatigue due to long duration). Second, the current methodology requires the stimuli to be controlled (if not fully synthesized) across a set of parameters before they can be utilized in AEP experiments. However, an obvious problem with using an artificial or a synthesized stimulus is the reduced ecological validity i.e., how well the synthesized stimulus is a representative of a natural stimulus.

To get around these problems, recently, there has been a surge of studies focusing on using machine learning approaches with EEG data collected with natural auditory stimuli.[94] [95] [96] [97] One of the approaches that could be useful includes obtaining a temporal response function, which is derived by extracting the speech features (e.g., envelope, phonetics, and semantics) from the natural stimuli, and is used in predicting the EEG response to the stimulus. Further in this approach, a Pearson's correlation (r) is calculated between the predicted and the actual obtained EEG. A higher correlation value is indicative of enhanced neural encoding of the natural stimuli.[97] [Fig. 4] depicts an example of this process.

Zoom
Figure 4 An encoding model where phonetics speech features are extracted that are analyzed alongside training electroencephalography (EEG) data to obtain the temporal response function, that is further employed to predict the EEG. In the final step, Pearson's r is calculated using the predicted EEG and the testing data of the recorded EEG.

Furthermore, one of the main goals in the field of communication disorders is to develop efficient assessment protocols that could accurately detect the presence of communication difficulties in a time- and cost-efficient manner. A recent study,[98] using machine learning approach (support vector machine classification) with cross validation, developed an objective method to predict communication difficulties based on the functioning of the auditory nervous system. Similar machine learning approaches have been validated in the identification of lexical tone contours in tone languages.[99] These machine learning approaches exhibit potential for future use in the field of communication disorders for a quick and accurate identification of communication disorders. However, more research is needed to bring these techniques into the mainstream clinics.


CONCLUSION

This article is a brief summary on the evolution of AEP technology, its current state, and the future of AEPs in the field of communication disorders. Since the advent of continuous EEG almost a century ago, there have been several landmark discoveries and inventions that have shaped the field of AEPs to its current state. This article summarizes the key milestones in the history of AEPs followed by a discussion on the clinical and research applications of the current AEP technology. While the existing AEP methodology is immensely popular and contributes to resolving some of the longstanding research questions in the area of communication disorders, there are some limiting aspects of the current methodology that are discussed in this article. Furthermore, this article touches on some of the machine learning approaches and their potential use with AEP data in developing neural markers for detecting communication disorders. Overall, this study can be useful for both beginners and regular users in the area of AEPs, as it provides them with an overview of AEP—its history, its current state, and its future directions.



CONFLICT OF INTEREST

None declared.


Address for correspondence

Akshay R. Maggu, Ph.D.
Department of Speech-Language-Hearing Sciences, Hofstra University
110 Hofstra University, 110 Davison Hall, Hempstead, NY 11549

Publication History

Article published online:
26 October 2022

© 2022. Thieme. All rights reserved.

Thieme Medical Publishers, Inc.
333 Seventh Avenue, 18th Floor, New York, NY 10001, USA


Zoom
Figure 1 Human scalp–recorded auditory evoked potentials representing the different levels of the auditory nervous system: (A) auditory brainstem responses to 100 µs click stimuli collecting at intensities ranging from 90 to 10 dB in 10 dB steps at 30.1/s repetition rate. Waves I, II, III, and V are clearly visible in this individual's data. A decrease in amplitude and increase in latency of the peaks can be observed as the intensity for presentation decreases; (B) middle latency response to 500 Hz tone burst stimuli collected at 7.1/s repetition rate at 70 dB. Na, Pa, and Nb peaks in the latency range of 15 to 40 ms are clearly visible; and (C) long latency response to 500 Hz tone burst stimuli collected at 1.1/s repetition rate at 70 dB. P1, N1, P2, and N2 peaks in the latency range of 60 to 200 ms clearly visible.
Zoom
Figure 2 Frequency following response collected with a rising lexical tone stimulus (/ji/ T2) from Cantonese. (A) Waveform of the 175-ms stimulus. (B) Waveform of the frequency following response (FFR) with 50 ms pre-stimulus baseline followed by the FFR followed by the post-stimulus baseline; (C) power spectral density of the stimulus; (D) power spectral density of the FFR; and (E) comparison of the pitch contours of the FFR and the stimulus. In this case, the FFR pitch contour has near-perfect resemblance to the stimulus pitch contour.
Zoom
Figure 3 Auditory evoked responses elicited in 80:20 oddball paradigms: (A) P300 (labeled) in the latency range of 300 to 400 ms and (B) mismatch negativity (MMN) (shaded) in the latency range of 100 to 300 ms.
Zoom
Figure 4 An encoding model where phonetics speech features are extracted that are analyzed alongside training electroencephalography (EEG) data to obtain the temporal response function, that is further employed to predict the EEG. In the final step, Pearson's r is calculated using the predicted EEG and the testing data of the recorded EEG.