Robust speech recognition by normalization of the acoustic space

doi:10.1109/ICASSP.1991.150483

Home
/
Papers
/
Robust speech recognition by normalization of the acoustic space

Proceedings Article•DOI•

Robust speech recognition by normalization of the acoustic space

Alejandro Acero¹, Richard M. Stern¹•Institutions (1)

Carnegie Mellon University¹

14 Apr 1991-pp 893-896

TL;DR: Several algorithms are presented that increase the robustness of SPHINX, the CMU (Carnegie Mellon University) continuous-speech speaker-independent recognition systems, by normalizing the acoustic space via minimization of the overall VQ distortion.

read less

Abstract: Several algorithms are presented that increase the robustness of SPHINX, the CMU (Carnegie Mellon University) continuous-speech speaker-independent recognition systems, by normalizing the acoustic space via minimization of the overall VQ distortion. The authors propose an affine transformation of the cepstrum in which a matrix multiplication perform frequency normalization and a vector addition attempts environment normalization. The algorithms for environment normalization are efficient and improve the recognition accuracy when the system is tested on a microphone other than the one on which it was trained. The frequency normalization algorithm applies a different warping on the frequency axis to different speakers and it achieves a 10% decrease in error rate. >

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Patent•

Intelligent Automated Assistant

[...]

Thomas R. Gruber¹, Adam Cheyer¹, Dag Kittlaus¹, Didier Rene Guzzoni¹, Christopher Dean Brigham¹, Richard Donald Giuli¹, Marcello Bastea-Forte¹, Harry J. Saddler¹ - Show less +4 more•Institutions (1)

Apple Inc.¹

11 Jan 2011

TL;DR: In this article, an intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions.

...read moreread less

Abstract: An intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions. The system can be implemented using any of a number of different platforms, such as the web, email, smartphone, and the like, or any combination thereof. In one embodiment, the system is based on sets of interrelated domains and tasks, and employs additional functionally powered by external services with which the system can interact.

...read moreread less

1,462 citations

Patent•

Automated Response to and Sensing of User Activity in Portable Devices

[...]

Brian Q. Huppi¹, Anthony M. Fadell¹, Derek Boyd Barrentine¹, Daniel Freeman¹•Institutions (1)

Apple Inc.¹

19 Oct 2007

TL;DR: In this paper, various methods and devices described herein relate to devices which, in at least certain embodiments, may include one or more sensors for providing data relating to user activity and at least one processor for causing the device to respond based on the user activity which was determined, at least in part, through the sensors.

...read moreread less

Abstract: The various methods and devices described herein relate to devices which, in at least certain embodiments, may include one or more sensors for providing data relating to user activity and at least one processor for causing the device to respond based on the user activity which was determined, at least in part, through the sensors. The response by the device may include a change of state of the device, and the response may be automatically performed after the user activity is determined.

...read moreread less

844 citations

Patent•

Using context information to facilitate processing of commands in a virtual assistant

[...]

Thomas R. Gruber¹, Christopher Dean Brigham¹, Daniel S. Keen¹, Gregory Novick¹, Phipps Benjamin S¹ - Show less +1 more•Institutions (1)

Apple Inc.¹

28 Sep 2012

TL;DR: In this article, a virtual assistant uses context information to supplement natural language or gestural input from a user, which helps to clarify the user's intent and reduce the number of candidate interpretations of user's input, and reduces the need for the user to provide excessive clarification input.

...read moreread less

Abstract: A virtual assistant uses context information to supplement natural language or gestural input from a user. Context helps to clarify the user's intent and to reduce the number of candidate interpretations of the user's input, and reduces the need for the user to provide excessive clarification input. Context can include any available information that is usable by the assistant to supplement explicit user input to constrain an information-processing problem and/or to personalize results. Context can be used to constrain solutions during various phases of processing, including, for example, speech recognition, natural language processing, task flow processing, and dialog generation.

...read moreread less

593 citations

Journal Article•DOI•

Robust continuous speech recognition using parallel model combination

[...]

Mark J. F. Gales¹, Steve Young¹•Institutions (1)

University of Cambridge¹

16 Sep 1996-IEEE Transactions on Speech and Audio Processing

TL;DR: After training on clean speech data, the performance of the recognizer was found to be severely degraded when noise was added to the speech signal at between 10 and 18 dB, but using PMC the performance was restored to a level comparable with that obtained when training directly in the noise corrupted environment.

...read moreread less

Abstract: This paper addresses the problem of automatic speech recognition in the presence of interfering noise. It focuses on the parallel model combination (PMC) scheme, which has been shown to be a powerful technique for achieving noise robustness. Most experiments reported on PMC to date have been on small, 10-50 word vocabulary systems. Experiments on the Resource Management (RM) database, a 1000 word continuous speech recognition task, reveal compensation requirements not highlighted by the smaller vocabulary tasks. In particular, that it is necessary to compensate the dynamic parameters as well as the static parameters to achieve good recognition performance. The database used for these experiments was the RM speaker independent task with either Lynx Helicopter noise or Operation Room noise from the NOISEX-92 database added. The experiments reported here used the HTK RM recognizer developed at CUED modified to include PMC based compensation for the static, delta and delta-delta parameters. After training on clean speech data, the performance of the recognizer was found to be severely degraded when noise was added to the speech signal at between 10 and 18 dB. However, using PMC the performance was restored to a level comparable with that obtained when training directly in the noise corrupted environment.

...read moreread less

509 citations

Cites background from "Robust speech recognition by normal..."

...Additionally, techniques have attempted to estimate the clean speech under additive and convolutional noise conditions [1]....
[...]

Patent•

Method and apparatus for building an intelligent automated assistant

[...]

Adam Cheyer¹, Didier Rene Guzzoni¹•Institutions (1)

Apple Inc.¹

08 Sep 2006

TL;DR: In this paper, a method for building an automated assistant includes interfacing a service-oriented architecture that includes a plurality of remote services to an active ontology, where the active ontologies includes at least one active processing element that models a domain.

...read moreread less

Abstract: A method and apparatus are provided for building an intelligent automated assistant. Embodiments of the present invention rely on the concept of “active ontologies” (e.g., execution environments constructed in an ontology-like manner) to build and run applications for use by intelligent automated assistants. In one specific embodiment, a method for building an automated assistant includes interfacing a service-oriented architecture that includes a plurality of remote services to an active ontology, where the active ontology includes at least one active processing element that models a domain. At least one of the remote services is then registered for use in the domain.

...read moreread less

389 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Suppression of acoustic noise in speech using spectral subtraction

[...]

S. Boll¹•Institutions (1)

University of Utah¹

01 Apr 1979-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A stand-alone noise suppression algorithm that resynthesizes a speech waveform and can be used as a pre-processor to narrow-band voice communications systems, speech recognition systems, or speaker authentication systems.

...read moreread less

Abstract: A stand-alone noise suppression algorithm is presented for reducing the spectral effects of acoustically added noise in speech. Effective performance of digital speech processors operating in practical environments may require suppression of noise from the digital wave-form. Spectral subtraction offers a computationally efficient, processor-independent approach to effective digital speech analysis. The method, requiring about the same computation as high-speed convolution, suppresses stationary noise from speech by subtracting the spectral noise bias calculated during nonspeech activity. Secondary procedures are then applied to attenuate the residual noise left after subtraction. Since the algorithm resynthesizes a speech waveform, it can be used as a pre-processor to narrow-band voice communications systems, speech recognition systems, or speaker authentication systems.

...read moreread less

4,862 citations

Proceedings Article•DOI•

Environmental robustness in automatic speech recognition

[...]

Alejandro Acero¹, Richard M. Stern¹•Institutions (1)

Carnegie Mellon University¹

03 Apr 1990

TL;DR: Initial efforts to make Sphinx, a continuous-speech speaker-independent recognition system, robust to changes in the environment are reported, and two novel methods based on additive corrections in the cepstral domain are proposed.

...read moreread less

Abstract: Initial efforts to make Sphinx, a continuous-speech speaker-independent recognition system, robust to changes in the environment are reported. To deal with differences in noise level and spectral tilt between close-talking and desk-top microphones, two novel methods based on additive corrections in the cepstral domain are proposed. In the first algorithm, the additive correction depends on the instantaneous SNR of the signal. In the second technique, expectation-maximization techniques are used to best match the cepstral vectors of the input utterances to the ensemble of codebook entries representing a standard acoustical ambience. Use of the algorithms dramatically improves recognition accuracy when the system is tested on a microphone other than the one on which it was trained. >

...read moreread less

461 citations

Journal Article•DOI•

Blind deconvolution through digital signal processing

[...]

Thomas G. Stockham¹, T.M. Cannon², R.B. Ingebretsen¹•Institutions (2)

University of Utah¹, Los Alamos National Laboratory²

01 Apr 1975

TL;DR: In this paper, the blind deconvolution problem of two signals when both are unknown is addressed and two related solutions which can be applied through digital signal processing in certain practical cases are discussed.

...read moreread less

Abstract: This paper addresses the problem of deconvolving two signals when both are unknown. The authors call this problem blind deconvolution. The discussion develops two related solutions which can be applied through digital signal processing in certain practical cases. The case of reverberated and resonated sound forms the center of the development. The specific problem of restoring old acoustic recordings provides an experimental test. The important effects of noise and non-stationary signals lead to the detailed part of the presentation. In addition, the paper presents results for the case of images degraded by some common forms of blur.

...read moreread less

370 citations

Journal Article•DOI•

A joint synchrony/mean-rate model of auditory speech processing

[...]

Stephanie Seneff¹•Institutions (1)

Massachusetts Institute of Technology¹

01 May 1990-Journal of Phonetics

TL;DR: A bank of critical-band filters defines the initial spectral analysis, and filter outputs are processed by a model of the nonlinear transduction stage in the cochlea, which accounts for such features as saturation, adaptation and forward masking.

...read moreread less

264 citations

Journal Article•DOI•

Discrete representation of signals

[...]

Alan V. Oppenheim¹, Don H. Johnson¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jun 1972

TL;DR: The requirements for digital sequences by other digital sequences and the use of such representations to implement a nonlinear warping of the digital frequency axis are discussed within the framework of simulating linear time-invariant systems.

...read moreread less

Abstract: In processing continuous-time signals by digitalmeans, it is necessary to represent the signal by a digital sequence. There are many ways other than periodic sampling for obtaining such a sequence. The requirements for such representations and some examples are discussed within the framework of simulating linear time-invariant systems. The representation of digital sequences by other digital sequences is also discussed, with particular emphasis on the use of such representations to implement a nonlinear warping of the digital frequency axis. Some applications and hardware implementation of this digital-frequency warping are described.

...read moreread less

219 citations