Enhancement of speech corrupted by acoustic noise

doi:10.1109/ICASSP.1979.1170788

Home
/
Papers
/
Enhancement of speech corrupted by acoustic noise

Proceedings Article•DOI•

Enhancement of speech corrupted by acoustic noise

M. Berouti¹, Richard Schwartz¹, John Makhoul¹•Institutions (1)

BBN Technologies¹

02 Apr 1979-Vol. 4, pp 208-211

TL;DR: This paper describes a method for enhancing speech corrupted by broadband noise based on the spectral noise subtraction method, which can automatically adapt to a wide range of signal-to-noise ratios, as long as a reasonable estimate of the noise spectrum can be obtained.

read less

Abstract: This paper describes a method for enhancing speech corrupted by broadband noise. The method is based on the spectral noise subtraction method. The original method entails subtracting an estimate of the noise power spectrum from the speech power spectrum, setting negative differences to zero, recombining the new power spectrum with the original phase, and then reconstructing the time waveform. While this method reduces the broadband noise, it also usually introduces an annoying "musical noise". We have devised a method that eliminates this "musical noise" while further reducing the background noise. The method consists in subtracting an overestimate of the noise power spectrum, and preventing the resultant spectral components from going below a preset minimum level (spectral floor). The method can automatically adapt to a wide range of signal-to-noise ratios, as long as a reasonable estimate of the noise spectrum can be obtained. Extensive listening tests were performed to determine the quality and intelligibility of speech enhanced by our method. Listeners unanimously preferred the quality of the processed speech. Also, for an input signal-to-noise ratio of 5 dB, there was no loss of intelligibility associated with the enhancement technique.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator

[...]

Yariv Ephraim¹, David Malah²•Institutions (2)

Stanford University¹, Technion – Israel Institute of Technology²

01 Dec 1984-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: In this article, a system which utilizes a minimum mean square error (MMSE) estimator is proposed and then compared with other widely used systems which are based on Wiener filtering and the "spectral subtraction" algorithm.

...read moreread less

Abstract: This paper focuses on the class of speech enhancement systems which capitalize on the major importance of the short-time spectral amplitude (STSA) of the speech signal in its perception. A system which utilizes a minimum mean-square error (MMSE) STSA estimator is proposed and then compared with other widely used systems which are based on Wiener filtering and the "spectral subtraction" algorithm. In this paper we derive the MMSE STSA estimator, based on modeling speech and noise spectral components as statistically independent Gaussian random variables. We analyze the performance of the proposed STSA estimator and compare it with a STSA estimator derived from the Wiener estimator. We also examine the MMSE STSA estimator under uncertainty of signal presence in the noisy observations. In constructing the enhanced signal, the MMSE STSA estimator is combined with the complex exponential of the noisy phase. It is shown here that the latter is the MMSE estimator of the complex exponential of the original phase, which does not affect the STSA estimation. The proposed approach results in a significant reduction of the noise, and provides enhanced speech with colorless residual noise. The complexity of the proposed algorithm is approximately that of other systems in the discussed class.

...read moreread less

3,905 citations

Journal Article•DOI•

Noise power spectral density estimation based on optimal smoothing and minimum statistics

[...]

Rainer Martin¹•Institutions (1)

RWTH Aachen University¹

01 Jul 2001-IEEE Transactions on Speech and Audio Processing

TL;DR: An unbiased noise estimator is developed which derives the optimal smoothing parameter for recursive smoothing of the power spectral density of the noisy speech signal by minimizing a conditional mean square estimation error criterion in each time step.

...read moreread less

Abstract: We describe a method to estimate the power spectral density of nonstationary noise when a noisy speech signal is given. The method can be combined with any speech enhancement algorithm which requires a noise power spectral density estimate. In contrast to other methods, our approach does not use a voice activity detector. Instead it tracks spectral minima in each frequency band without any distinction between speech activity and speech pause. By minimizing a conditional mean square estimation error criterion in each time step we derive the optimal smoothing parameter for recursive smoothing of the power spectral density of the noisy speech signal. Based on the optimally smoothed power spectral density estimate and the analysis of the statistics of spectral minima an unbiased noise estimator is developed. The estimator is well suited for real time implementations. Furthermore, to improve the performance in nonstationary noise we introduce a method to speed up the tracking of the spectral minima. Finally, we evaluate the proposed method in the context of speech enhancement and low bit rate speech coding with various noise types.

...read moreread less

1,731 citations

Journal Article•DOI•

Enhancement and bandwidth compression of noisy speech

[...]

Jae Lim¹, Alan V. Oppenheim•Institutions (1)

Massachusetts Institute of Technology¹

26 Jun 1979

TL;DR: An overview of the variety of techniques that have been proposed for enhancement and bandwidth compression of speech degraded by additive background noise is provided to suggest a unifying framework in terms of which the relationships between these systems is more visible and which hopefully provides a structure which will suggest fruitful directions for further research.

...read moreread less

Abstract: Over the past several years there has been considerable attention focused on the problem of enhancement and bandwidth compression of speech degraded by additive background noise. This interest is motivated by several factors including a broad set of important applications, the apparent lack of robustness in current speech-compression systems and the development of several potentially promising and practical solutions. One objective of this paper is to provide an overview of the variety of techniques that have been proposed for enhancement and bandwidth compression of speech degraded by additive background noise. A second objective is to suggest a unifying framework in terms of which the relationships between these systems is more visible and which hopefully provides a structure which will suggest fruitful directions for further research.

...read moreread less

1,236 citations

Proceedings Article•DOI•

SEGAN: Speech Enhancement Generative Adversarial Network

[...]

Santiago Pascual¹, Antonio Bonafonte¹, Joan Serrà²•Institutions (2)

Polytechnic University of Catalonia¹, Telefónica²

28 Mar 2017

TL;DR: This work proposes the use of generative adversarial networks for speech enhancement, and operates at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them.

...read moreread less

Abstract: Current speech enhancement techniques operate on the spectral domain and/or exploit some higher-level feature. The majority of them tackle a limited number of noise conditions and rely on first-order statistics. To circumvent these issues, deep networks are being increasingly used, thanks to their ability to learn complex functions from large example sets. In this work, we propose the use of generative adversarial networks for speech enhancement. In contrast to current techniques, we operate at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them. We evaluate the proposed model using an independent, unseen test set with two speakers and 20 alternative noise conditions. The enhanced samples confirm the viability of the proposed model, and both objective and subjective evaluations confirm the effectiveness of it. With that, we open the exploration of generative architectures for speech enhancement, which may progressively incorporate further speech-centric design choices to improve their performance.

...read moreread less

1,001 citations

Cites methods from "Enhancement of speech corrupted by ..."

...Classic speech enhancement methods are spectral subtraction [6], Wiener filtering [7], statistical model-based methods [8], and subspace algorithms [9, 10]....
[...]

Journal Article•DOI•

A signal subspace approach for speech enhancement

[...]

Yariv Ephraim¹, H.L. Van Trees¹•Institutions (1)

George Mason University¹

01 Jul 1995-IEEE Transactions on Speech and Audio Processing

TL;DR: The popular spectral subtraction speech enhancement approach is shown to be a signal subspace approach which is optimal in an asymptotic (large sample) linear minimum mean square error sense, assuming the signal and noise are stationary.

...read moreread less

Abstract: A comprehensive approach for nonparametric speech enhancement is developed. The underlying principle is to decompose the vector space of the noisy signal into a signal-plus-noise subspace and a noise subspace. Enhancement is performed by removing the noise subspace and estimating the clean signal from the remaining signal subspace. The decomposition can theoretically be performed by applying the Karhunen-Loeve transform (KLT) to the noisy signal. Linear estimation of the clean signal is performed using two perceptually meaningful estimation criteria. First, signal distortion is minimized while the residual noise energy is maintained below some given threshold. This criterion results in a Wiener filter with adjustable input noise level. Second, signal distortion is minimized for a fixed spectrum of the residual noise. This criterion enables masking of the residual noise by the speech signal. It results in a filter whose structure is similar to that obtained in the first case, except that now the gain function which modifies the KLT coefficients is solely dependent on the desired spectrum of the residual noise. The popular spectral subtraction speech enhancement approach is shown to be a particular case of the proposed approach. It is proven to be a signal subspace approach which is optimal in an asymptotic (large sample) linear minimum mean square error sense, assuming the signal and noise are stationary. Our listening tests indicate that 14 out of 16 listeners strongly preferred the proposed approach over the spectral subtraction approach. >

...read moreread less

968 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Suppression of acoustic noise in speech using spectral subtraction

[...]

S. Boll¹•Institutions (1)

University of Utah¹

01 Apr 1979-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A stand-alone noise suppression algorithm that resynthesizes a speech waveform and can be used as a pre-processor to narrow-band voice communications systems, speech recognition systems, or speaker authentication systems.

...read moreread less

Abstract: A stand-alone noise suppression algorithm is presented for reducing the spectral effects of acoustically added noise in speech. Effective performance of digital speech processors operating in practical environments may require suppression of noise from the digital wave-form. Spectral subtraction offers a computationally efficient, processor-independent approach to effective digital speech analysis. The method, requiring about the same computation as high-speed convolution, suppresses stationary noise from speech by subtracting the spectral noise bias calculated during nonspeech activity. Secondary procedures are then applied to attenuate the residual noise left after subtraction. Since the algorithm resynthesizes a speech waveform, it can be used as a pre-processor to narrow-band voice communications systems, speech recognition systems, or speaker authentication systems.

...read moreread less

4,862 citations

Proceedings Article•DOI•

An investigation of several frequency-domain processing methods for enhancing the intelligibility of speech in wideband random noise

[...]

R. Curtis, R. Niederjohn

01 Apr 1978

TL;DR: This paper describes results of a study of several frequency-domain processing methods for enhancing the intelligibility of speech in wideband random noise, finding that all successful techniques investigated are similar in that they are an attempt to emphasize spectral components as a function of the amount by which they exceed the noise.

...read moreread less

Abstract: This paper describes results of a study of several frequency-domain processing methods for enhancing the intelligibility of speech in wideband random noise. Five categories of processing methods are explored. These include the INTEL technique, a technique based upon minimum mean square filtering, several techniques based upon subtraction of the estimated spectrum of the noise from the spectrum of the speech plus noise, spectrum squaring, and techniques based upon pitch frequency analysis. The results of this study have provided considerable insight into the individual processing methods and into the use of frequency-domain processing methods in general. A major conclusion of this work is that all successful techniques investigated are similar in that they are an attempt to emphasize spectral components as a function of the amount by which they exceed the noise. A second conclusion is that unless the spectral weighting within a time-window is relatively smooth, it will introduce conspicuous background distortion.

...read moreread less

22 citations

Proceedings Article•DOI•

Suppression of noise in speech using the saber method

[...]

S. Boll¹•Institutions (1)

University of Utah¹

01 Apr 1978

TL;DR: A fundamental result is developed which shows that the spectral magnitude of speech plus noise can be effectively approximated as the sum of magnitudes of speech and noise.

...read moreread less

Abstract: A stand alone noise suppression algorithm is described for reducing the spectral effects of acoustically added noise in speech. A fundamental result is developed which shows that the spectral magnitude of speech plus noise can be effectively approximated as the sum of magnitudes of speech and noise. Using this simple phase independent additive model, the noise bias present in the short time spectrum is reduced by subtracting off the expected noise spectrum calculated during nonspeech activity. After bias removal, the time waveform is recalculated from the modified magnitude and saved phase. This Spectral Averaging for Bias Estimation and Removal, or SABER method requires only one FFT per time window for analysis and synthesis.

...read moreread less

16 citations

Journal Article•

Extraction of Speech in Noise by Digital Filtering

[...]

Hideto Suzuki, Juichi Igarashi, Yasushi Ishii

01 Aug 1977-The Journal of the Acoustical Society of Japan

8 citations