Semin Hear 2004; 25(3): 241-255
DOI: 10.1055/s-2004-832858
Published in 2004 by Thieme Medical Publishers, Inc., 333 Seventh Avenue, New York, NY 10001, USA.

Effects of Spectro-Temporal Asynchrony in Auditory and Auditory-Visual Speech Processing

Ken W. Grant1 , Steven Greenberg2 , David Poeppel3 , Virginie van Wassenhove3
  • 1Auditory-Visual Speech Recognition Laboratory, Walter Reed Army Medical Center, Army Audiology and Speech Center, Washington, District of Columbia
  • 2International Computer Speech Institute, Berkeley, California
  • 3Cognitive Neuroscience of Language Laboratory, Neuroscience and Cognitive Science Program (NACS), Department of Biology and Department of Linguistics, University of Maryland, College Park, Maryland
Further Information

Publication History

Publication Date:
02 September 2004 (online)

Throughout his career, Ira Hirsh studied and published articles and books pertaining to many aspects of the auditory system. These included sound conduction in the ear, cochlear mechanics, masking, auditory localization, psychoacoustic behavior in animals, speech perception, medical and audiological applications, coupling between psychophysics and physiology, and ecological acoustics. However, it is Hirsh's work on auditory timing of simple and complex rhythmic patterns, the backbone of speech and music, that are at the heart of his more recent work. In this article, we report on several aspects of temporal processing of speech signals, both within and across sensory systems. Data are presented on perceived simultaneity and intelligibility of auditory and auditory-visual speech stimuli in which stimulus components are presented either synchronously or asynchronously. Differences in the symmetry and shape of temporal windows derived from these datasets are highlighted. Results show two distinct ranges for temporal integration for speech processing; one relatively short window, ∼40 milliseconds, and the other much longer, around 250 milliseconds. In the case of auditory-visual speech processing, the temporal window is highly asymmetric, strongly favoring conditions where the visual stimulus precedes the acoustic stimulus.

REFERENCES

  • 1 Hirsh I J. Auditory perception and speech. In: Atkinson RC, Herrnstein RJ, Lindzey G, Luce RD Stevens' Handbook of Experimental Psychology, Vol. 1. New York; Wiley 1988: 377-408
  • 2 Ellis W D. A Source Book of Gestalt Psychology. New York; Harcourt, Brace and World 1938
  • 3 Lauter J L. Stimulus characteristics and relative ear advantages: a new look at old data.  J Acoust Soc Am. 1983;  74 1-17
  • 4 Bregman A S. Auditory Scene Analysis: the Perceptual Organization of Sound. Cambridge, MA; Bradford Books, MIT Press 1990
  • 5 Darwin C J. Perceiving vowels in the presence of another sound: constraints on formant perception.  J Acoust Soc Am. 1984;  76 1636-1647
  • 6 Darwin C J, Sutherland N S. Grouping frequency components of vowels: when is a harmonic not a harmonic?.  Q J Exp Psychol. 1984;  36A 193-208
  • 7 Summerfield Q, Culling J F. Auditory segregation of competing voices: absence of effects of FM or AM coherence.  Philos Trans R Soc Lond B Biol Sci. 1992;  336 357-365
  • 8 Hukin R W, Darwin C J. Comparison of the effect of onset asynchrony on auditory grouping in pitch matching and vowel identification.  Percept Psychophys. 1995;  57 191-196
  • 9 Hirsh I J. Temporal aspects of hearing. In: Tower DB Human Communication and Its Disorders. New York; Raven Press 1975: 157-162
  • 10 Watson C S. Temporal acuity and the judgment of temporal order: related but distinct auditory abilities.  Semin Hear. 2004;  25 219-228
  • 11 Green D M. Temporal auditory acuity.  Psychol Rev. 1971;  78 540-551
  • 12 Penner M J, Robinson C E, Green D M. The critical masking interval.  J Acoust Soc Am. 1972;  48 894-905
  • 13 Warren R M, Obusek C J, Farmer R M. Auditory sequences: confusion of patterns other than speech or music.  Science. 1969;  164 586-587
  • 14 Hirsh I J. Auditory perception of temporal order.  J Acoust Soc Am. 1959;  31 759-767
  • 15 Hirsh I J, Sherrick C E. Perceived order in different sense modalities.  J Exp Psych. 1961;  62 423-432
  • 16 Greenberg S, Arai T, Silipo R. Speech intelligibility derived from exceedingly sparse spectral information. In: Proceedings of the International Conference of Spoken Language Processing Sydney, Australia; ICSLP 1998: 74-77
  • 17 Silipo R, Greenberg S, Arai T. Temporal constraints on speech intelligibility as deduced from exceedingly sparse spectral representations. In: Proceedings of Eurospech 1999 Budapest, Hungary; 1999: 2687-2690
  • 18 Greenberg S, Arai T. The relation between speech intelligibility and the complex modulation spectrum. In: Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech-2001) Aalborg, Denmark; 2001: 473-476
  • 19 Garofolo J S, Lamel L F, Fisher W M, Fiscus J G, Pallett D S, Dahlgren N L. The DARPA TIMIT acoustic-phonetic continuous speech corpus [CD-ROM]. Produced by the National Institute of Standards and Technology (NIST) 1993
  • 20 Greenberg S. Understanding speech understanding: towards a unified theory of speech perception. In: Proceedings of the ESCA Workshop on the Auditory Basis of Speech Perception Keele University 1996: 1-8
  • 21 Arai T, Greenberg S. Speech intelligibility in the presence of cross-channel spectral asynchrony. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing Seattle, WA; 1998: 933-936
  • 22 Huggins A WF. On the perception of temporal phenomena in speech.  J Acoust Soc Am. 1972;  51 1279-1290
  • 23 Greenberg S. On the origins of speech intelligibility in the real world. In: Proceedings of the ESCA Workshop on Robust Speech Recognition for Unknown Communication Channels Pont-a-Mousson, France; 1997: 23-32
  • 24 Poeppel D. The analysis of speech in different temporal integration windows: Cerebral lateralization as 'asymmetric sampling in time'.  Speech Communication. 2003;  41 245-255
  • 25 Sumby W H, Pollack I. Visual contribution to speech intelligibility in noise.  J Acoust Soc Am. 1954;  26 212-215
  • 26 Grant K W, Braida L D. Evaluating the Articulation Index for audiovisual input.  J Acoust Soc Am. 1991;  89 2952-2960
  • 27 Grant K W, Seitz P F. Measures of auditory-visual integration in nonsense syllables and sentences.  J Acoust Soc Am. 1998;  104 2438-2450
  • 28 Grant K W, Greenberg S. Speech intelligibility derived from asynchronous processing of auditory-visual information. In: Proceedings Auditory-Visual Speech Processing (AVSP 2001) Scheelsminde, Denmark; 2001: 132-137
  • 29 van Wassenhove V, Grant K W, Poeppel D. Timing of auditory-visual integration in the McGurk effect. Presented at: the Society of Neuroscience Annual Meeting San Diego, CA; 2001: 488
  • 30 van Wassenhove V, Grant K W, Poeppel D. Temporal integration in the McGurk effect. Presented at: the Annual Meeting of the Cognitive Neuroscience Society San Francisco, CA; 2002: 146
  • 31 Institute of Electrical and Electronic Engineers (IEEE) .IEEE Recommended Practice for Speech Quality Measures. New York; IEEE 1969
  • 32 McGrath M, Summerfield Q. Intermodal timing relations and audio-visual speech recognition by normal-hearing adults.  J Acoust Soc Am. 1985;  77 678-685
  • 33 Pandey P C, Kunov H, Abel S M. Disruptive effects of auditory signal delay on speech perception with lipreading.  J Aud Res. 1986;  26 27-41
  • 34 McGurk H, McDonald J. Hearing lips and seeing voices.  Nature. 1976;  264 746-747
  • 35 Massaro D W. Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry. Hillsdale, NJ; Lawrence Earlbaum 1987
  • 36 Walden B E, Montgomery A A, Prosek R A, Hawkins D B. Visual biasing of normal and impaired auditory speech perception.  J Speech Hear Res. 1990;  33 163-173
  • 37 Green K P, Kuhl P K, Meltzoff A N, Stevens E B. Integrating speech information across talkers, gender, and sensory modality: female faces and male voices in the McGurk effect.  Percep Psychophys. 1991;  50 524-536
  • 38 Munhall K, Gribble P, Sacco L, Ward M. Temporal constraints on the McGurk effect.  Percep Psychophys. 1996;  58 351-362
  • 39 Massaro D W, Cohen M M, Smeele P M. Perception of asynchronous and conflicting visual and auditory speech.  J Acoust Soc Am. 1996;  100 1777-1786
  • 40 Levitt H. Transformed up-down methods in psychoacoustics.  J Acoust Soc Am. 1971;  49 467-477
  • 41 Hirsh I J. Temporal order and auditory perception. In: Moskowitz HR, Scharf B, Stevens JC Sensation and Measurement. Dordrecht-Holland; D. Reidel 1974: 251-258
  • 42 Grant K W, Walden B E. Evaluating the articulation index for auditory-visual consonant recognition.  J Acoust Soc Am. 1996a;  100 2415-2424
  • 43 Grant K W, Walden B E, Seitz P F. Auditory-visual speech recognition by hearing-impaired subjects: Consonant recognition, sentence recognition, and auditory-visual integration.  J Acoust Soc Am. 1998;  103 2677-2690
  • 44 Steinschneider M, Schroeder C E, Arezzo J C, Vaughan Jr H G. Speech evoked activity in primary auditory cortex: effects of voice onset time.  Electroencephalography and Clinical Neurophysiology. 1994;  92 30-43
  • 45 Steinschneider M, Volkov I O, Noh M D, Garell P C, Howard M A. Temporal encoding of the voice onset time phonetic parameter by field potentials recorded directly from human auditory cortex.  J Neurophys. 1999;  82 2346-2357
  • 46 Seitz P F, Grant K W. Modality, perceptual encoding speed, and time course of phonetic information. In: Massaro DW Proceedings of Auditory-Visual Speech Processing (AVSP ‘99) [CDROM] Santa Cruz, CA; 1999
  • 47 Grant K W, Walden B E. The spectral distribution of prosodic information.  J Speech Hear Res. 1996b;  39:28 228-238
  • 48 Stevens K N, Blumstein S E. Invariant cues for place of articulation in stop consonants.  J Acoust Soc Am. 1978;  64 1358-1368

Ken W Grant

Auditory-Visual Speech Recognition Laboratory, Walter Reed Army Medical Center

Army Audiology and Speech Center, Washington, DC 20307-5001

Email: grant@tidalwave.net

    >