The Lombard reflex and its role on human listeners and automatic speech recognizers.

doi:10.1121/1.405631

Home
/
Papers
/
The Lombard reflex and its role on human listeners and automatic speech recognizers.

Journal Article•DOI•

The Lombard reflex and its role on human listeners and automatic speech recognizers.

Jean-Claude Junqua¹•Institutions (1)

Panasonic¹

01 Jan 1993-Journal of the Acoustical Society of America (Acoustical Society of America)-Vol. 93, Iss: 1, pp 510-524

TL;DR: Both acoustic and perceptual analyses suggest that the influence of the Lombard effect on male and female speakers is different and bring to light that, even if some tendencies across speakers can be observed consistently, the Lombardy reflex is highly variable from speaker to speaker.

read less

Abstract: Automatic speech recognition experiments show that, depending on the task performed and how speech variability is modeled, automatic speech recognizers are more or less sensitive to the Lombard reflex. To gain an understanding about the Lombard effect with the prospect of improving performance of automatic speech recognizers, (1) an analysis was made of the acoustic‐phonetic changes occurring in Lombard speech, and (2) the influence of the Lombard effect on speech perception was studied. Both acoustic and perceptual analyses suggest that the influence of the Lombard effect on male and female speakers is different. The analyses also bring to light that, even if some tendencies across speakers can be observed consistently, the Lombard reflex is highly variable from speaker to speaker. Based on the results of the acoustic and perceptual studies, some ways of dealing with Lombard speech variability in automatic speech recognition are also discussed.

...read moreread less

Citations

PDF

Open Access

More filters

Book•

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

[...]

Dan Jurafsky, James Martin

01 Jan 2000

TL;DR: This book takes an empirical approach to language processing, based on applying statistical and other machine-learning algorithms to large corpora, to demonstrate how the same algorithm can be used for speech recognition and word-sense disambiguation.

...read moreread less

Abstract: From the Publisher: This book takes an empirical approach to language processing, based on applying statistical and other machine-learning algorithms to large corpora.Methodology boxes are included in each chapter. Each chapter is built around one or more worked examples to demonstrate the main idea of the chapter. Covers the fundamental algorithms of various fields, whether originally proposed for spoken or written language to demonstrate how the same algorithm can be used for speech recognition and word-sense disambiguation. Emphasis on web and other practical applications. Emphasis on scientific evaluation. Useful as a reference for professionals in any of the areas of speech and language processing.

...read moreread less

3,794 citations

Posted Content•

Deep Speech: Scaling up end-to-end speech recognition

[...]

Awni Hannun¹, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, Andrew Y. Ng - Show less +7 more•Institutions (1)

Baidu¹

17 Dec 2014-arXiv: Computation and Language

TL;DR: Deep Speech, a state-of-the-art speech recognition system developed using end-to-end deep learning, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set.

...read moreread less

Abstract: We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learns a function that is robust to such effects. We do not need a phoneme dictionary, nor even the concept of a "phoneme." Key to our approach is a well-optimized RNN training system that uses multiple GPUs, as well as a set of novel data synthesis techniques that allow us to efficiently obtain a large amount of varied data for training. Our system, called Deep Speech, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set. Deep Speech also handles challenging noisy environments better than widely used, state-of-the-art commercial speech systems.

...read moreread less

1,761 citations

Cites background from "The Lombard reflex and its role on ..."

...Using a combination of collected and synthesized data, our system learns robustness to realistic noise and speaker variation (including Lombard Effect [20])....
[...]

Book•

3-D sound for virtual reality and multimedia

[...]

Durand R. Begault¹•Institutions (1)

Ames Research Center¹

01 Jan 1994

TL;DR: In this article, technology and applications for the rendering of virtual acoustic spaces are reviewed, including applications to computer workstations, communication systems, aeronautics and space, and sonic arts.

...read moreread less

Abstract: Technology and applications for the rendering of virtual acoustic spaces are reviewed. Chapter 1 deals with acoustics and psychoacoustics. Chapters 2 and 3 cover cues to spatial hearing and review psychoacoustic literature. Chapter 4 covers signal processing and systems overviews of 3-D sound systems. Chapter 5 covers applications to computer workstations, communication systems, aeronautics and space, and sonic arts. Chapter 6 lists resources. This TM is a reprint of the 1994 book from Academic Press.

...read moreread less

960 citations

Book Chapter•DOI•

Acoustic Communication in Noise

[...]

Henrik Brumm¹, Hans Slabbekoorn²•Institutions (2)

Free University of Berlin¹, Leiden University²

01 Jan 2005-Advances in The Study of Behavior

TL;DR: This chapter reviews recent advancements in studies of vocal adaptations to interference by background noise and relates these to fundamental issues in sound perception in animals and humans.

...read moreread less

Abstract: Publisher Summary Environmental noise can affect acoustic communication through limiting the broadcast area, or active space, of a signal by decreasing signal-to-noise ratios at the position of the receiver. At the same time, noise is ubiquitous in all habitats and is, therefore, likely to disturb animals, as well as humans, under many circumstances. However, both animals and humans have evolved diverse solutions to the background noise problem, and this chapter reviews recent advancements in studies of vocal adaptations to interference by background noise and relate these to fundamental issues in sound perception. The chapter starts with the discussion of sender's side by considering potential evolutionary shaping of species-specific signal characteristics and individual short‐term adjustments of signal features. Subsequently, it focuses on the receivers of signals and reviews their sensory capacities for signal detection, recognition, and discrimination and relates these issues to auditory scene analysis and the ecological concept of signal space. The data from studies on insects, anurans, birds, and mammals, including humans, and to a lesser extent available work on fish and reptiles is also discussed in the chapter.

...read moreread less

845 citations

Journal Article•DOI•

Speech recognition in noisy environments: a survey

[...]

Yifan Gong¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

01 Apr 1995-Speech Communication

TL;DR: The survey indicates that the essential points in noisy speech recognition consist of incorporating time and frequency correlations, giving more importance to high SNR portions of speech in decision making, exploiting task-specific a priori knowledge both of speech and of noise, using class-dependent processing, and including auditory models in speech processing.

...read moreread less

712 citations