scispace - formally typeset
Search or ask a question

Showing papers on "Noise published in 2002"


Proceedings ArticleDOI
13 May 2002
TL;DR: This paper proposes a multi-band spectral subtraction approach which takes into account the fact that colored noise affects the speech spectrum differently at various frequencies, resulting in superior speech quality and largely reduced musical noise.
Abstract: The spectral subtraction method is a well-known noise reduction technique. Most implementations and variations of the basic technique advocate subtraction of the noise spectrum estimate over the entire speech spectrum. However, real world noise is mostly colored and does not affect the speech signal uniformly over the entire spectrum. In this paper, we propose a multi-band spectral subtraction approach which takes into account the fact that colored noise affects the speech spectrum differently at various frequencies. This method outperforms the standard power spectral subtraction method resulting in superior speech quality and largely reduced musical noise.

554 citations


Journal ArticleDOI
TL;DR: The authors found that nightingales do not maximize song amplitude but regulate vocal intensity dependent on the level of masking noise, which may serve to maintain a specific signal-to-noise ratio that is favorable for signal production.

223 citations


Journal ArticleDOI
TL;DR: Benefit of interrupted maskers was larger for unshaped than for speech-shaped noise, consistent with AI predictions, and speech-recognition scores improved more for younger than for older subjects, particularly at the higher noise level.
Abstract: To assess age-related differences in benefit from masker modulation, younger and older adults with normal hearing but not identical audiograms listened to nonsense syllables in each of two maskers: (1) a steady-state noise shaped to match the long-term spectrum of the speech, and (2) this same noise modulated by a 10-Hz square wave, resulting in an interrupted noise. An additional low-level broadband noise was always present which was shaped to produce equivalent masked thresholds for all subjects. This minimized differences in speech audibility due to differences in quiet thresholds among subjects. An additional goal was to determine if age-related differences in benefit from modulation could be explained by differences in thresholds measured in simultaneous and forward maskers. Accordingly, thresholds for 350-ms pure tones were measured in quiet and in each masker; thresholds for 20-ms signals in forward and simultaneous masking were also measured at selected signal frequencies. To determine if benefit from modulated maskers varied with masker spectrum and to provide a comparison with previous studies, a subgroup of younger subjects also listened in steady-state and interrupted noise that was not spectrally shaped. Articulation index (AI) values were computed and speech-recognition scores were predicted for steady-state and interrupted noise; predicted benefit from modulation was also determined. Masked thresholds of older subjects were slightly higher than those of younger subjects; larger age-related threshold differences were observed for short-duration than for long-duration signals. In steady-state noise, speech recognition for older subjects was poorer than for younger subjects, which was partially attributable to older subjects' slightly higher thresholds in these maskers. In interrupted noise, although predicted benefit was larger for older than younger subjects, scores improved more for younger than for older subjects, particularly at the higher noise level. This may be related to age-related increases in thresholds in steady-state noise and in forward masking, especially at higher frequencies. Benefit of interrupted maskers was larger for unshaped than for speech-shaped noise, consistent with AI predictions.

165 citations


Journal ArticleDOI
TL;DR: Moderate levels of natural background sound reduced a female's ability to discriminate between males' calls even when she could detect them, justifying recent theoretical analyses of the importance of receivers' errors in the evolution of communication.

151 citations


01 Jan 2002
TL;DR: In this article, an integrated approach in designing a noise reduction headset for the audio and communication applications is presented, which uses single microphone per ear cup, thus produces a more compact, lower power consumption, cheaper solution, and ease of integration with existing audio devices to form an integrated feedback active noise control headsets.
Abstract: This paper presents an integrated approach in designing a noise reduction headset for the audio and communication applications. Conventional passive headsets give good attenuation of ambient noise in the upper frequency range, while most of these devices fail below 500 Hz. Unlike the feedforward method, the adaptive feedback active noise control technique provides a more accurate noise cancellation since the microphone is placed inside the ear-cup of the headset. Furthermore, the system uses single microphone per ear cup, thus produces a more compact, lower power consumption, cheaper solution, and ease of integration with existing audio and communication devices to form an integrated feedback active noise control headsets. Simulation results have been conducted to show that the integrated approach can remove the disturbing noise and at the same time, allow the desired speech or audio signal to pass through without cancellation.

118 citations


PatentDOI
Yong Rui1
TL;DR: In this article, a system and process for estimating the location of a speaker using signals output by a microphone array characterized by multiple pairs of audio sensors is described, and a consensus location for the speaker is computed from the individual location estimates associated with each pair of microphone array audio sensors taking into consideration the uncertainty of each estimate.
Abstract: A system and process is described for estimating the location of a speaker using signals output by a microphone array characterized by multiple pairs of audio sensors. The location of a speaker is estimated by first determining whether the signal data contains human speech components and filtering out noise attributable to stationary sources. The location of the person speaking is then estimated using a time-delay-of-arrival based SSL technique on those parts of the data determined to contain human speech components. A consensus location for the speaker is computed from the individual location estimates associated with each pair of microphone array audio sensors taking into consideration the uncertainty of each estimate. A final consensus location is also computed from the individual consensus locations computed over a prescribed number of sampling periods using a temporal filtering technique.

108 citations


Journal ArticleDOI
TL;DR: Response correlation in noise suggested that the LP population consisted of two subgroups, one whose responses appeared relatively normal, and another whose responses were severely degraded by repetition in noise.

89 citations


Journal ArticleDOI
TL;DR: The data suggest that 12 frequency channels are more than adequate for users to achieve asymptotic performance levels in clinical speech tests applied in the presence of wideband noise at moderate signal-to-noise ratios.
Abstract: OBJECTIVE: The objective of the investigation described in this paper was the determination of the number of (widely spaced) active electrodes needed for users of a COMBI 40+ cochlear implant to achieve asymptotic performance in the recognition of speech against a background of wideband noise. DESIGN: This study measured the performance in speech tests of patients using the Med-El implementation of continuous interleaved sampling with widely spaced electrode pair subsets of 2, 3, 4, 6, 8, and 10 out of a possible maximum of 12. An eight-vowel test, a 16-consonant test, and BKB sentences were presented against a background of pink noise. Additionally, AB monosyllabic words were presented both in quiet and in noise to processors with 6, 8, and 11 widely spaced electrodes. 11 subjects participated in the study. RESULTS: Using moderate signal-to-noise ratios, for these patients the curve relating percentage score to increasing numbers of active channels approached an asymptote before the 10-channel data point was reached. Asymptotic performance was achieved using four channels for consonants, and eight channels for sentences. Understanding of monosyllabic words reached a maximum value at a similar number of channels for both quiet conditions and against a background of pink noise, and the mean increase in test score between 6 and 11 channels was only 7%. CONCLUSIONS: These results are similar to those of previous experiments carried out in quiet listening conditions. The data suggest that 12 frequency channels (the number implemented by the COMBI 40+ cochlear implant) are more than adequate for users to achieve asymptotic performance levels in clinical speech tests applied in the presence of wideband noise at moderate signal-to-noise ratios.

76 citations


Proceedings ArticleDOI
13 May 2002
TL;DR: A microphone array post-filtering approach, applicable to adaptive beamformer, that differentiates non-stationary noise components from speech components is introduced, based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique.
Abstract: Microphone array post-filtering allows additional reduction of noise components at a beamformer output. Existing techniques are either restricted to classical delay-and-sum beamformers, or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components. In this paper, we introduce a microphone array post-filtering approach, applicable to adaptive beamformer, that differentiates non-stationary noise components from speech components. The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering. Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique, a significantly reduced level of non-stationary noise is achieved without further distorting speech components. Experimental results demonstrate the effectiveness of the proposed method.

76 citations


Proceedings ArticleDOI
13 May 2002
TL;DR: A novel technique for estimating the signal power spectral density to be used in the transfer function of a microphone array post-filter is proposed, which results in significant improvement in terms of objective speech quality measures and speech recognition performance.
Abstract: This paper proposes a novel technique for estimating the signal power spectral density to be used in the transfer function of a microphone array post-filter. The technique is a modification of the existing Zelinski post-filter, which uses the auto- and cross-spectral densities of the array inputs to estimate the signal and noise spectral densities. The Zelinski technique, however, assumes zero cross-correlation between noise on different sensors. This assumption is inaccurate in real conditions, particularly at low frequencies and for arrays with closely spaced sensors. In this paper we replace this with an assumption of a theoretically diffuse noise field, which is more appropriate in a variety of realistic noise environments. In experiments using noise recordings from an office of computer workstations, the modified post-filter results in significant improvement in terms of objective speech quality measures and speech recognition performance.

63 citations


Proceedings ArticleDOI
13 May 2002
TL;DR: The proposed endpoint detection of speech improves the SD recognition accuracy by 24% for office noise, and reduces the false rejection rates for both SI and SD by 45% for babble noise and lobby noise.
Abstract: We propose a new approach for classifying speech vs. non-speech, which proves to significantly improve speech recognition performance under noise. The proposed algorithm relies on the energy and spectral characteristics of the signal and applies a 3-level two-dimensional thresholding to determine whether an input frame is speech or non-speech. The algorithm runs in real-time, and offers better immunity to background noise, and to background speech than traditional energy-based word boundary detection. The performance of the endpoint detector is reported here in terms of improvements in speaker-independent (SI) and speaker-dependent (SD) recognition performance using 5 different simulated noise conditions and various signal-to-noise ratios (SNR). The proposed endpoint detection of speech improves the SD recognition accuracy by 24% for office noise, and reduces the false rejection rates for both SI and SD by 45% for babble noise and lobby noise.

Proceedings Article
01 Jan 2002
TL;DR: It is shown that a combination of short-term noise filtering and longterm log spectral subtraction can further reduce recognition word error rates.
Abstract: Far-field microphone speech signals cause high error rates for automatic speech recognition systems, due to room reverberation and lower signal-to-noise ratios We have observed large increases in speech recognition word error rates when using a far-field (3-6 feet) microphone in a conference room, in comparison with recordings from close-talking microphones In an earlier paper, we showed improvements in far-field speech recognition performance using a longterm log spectral subtraction method to combat reverberation This method is based on a principle similar to cepstral mean subtraction but uses a much longer analysis window (eg, 1 s) in order to deal with reverberation Here we show that a combination of short-term noise filtering and longterm log spectral subtraction can further reduce recognition word error rates

Journal Article
Alberto Behar1, Ewen MacDonald1, Jason Lee1, Jie Cui1, Hans Kunov1, Willy Wong1 
TL;DR: In this article, a survey was performed to assess the risk of hearing loss to school music teachers during the course of their activities, and limited recommendations on how to reduce the noise exposures are provided.
Abstract: A noise exposure survey was performed to assess the risk of hearing loss to school music teachers during the course of their activities. Noise exposure of 18 teachers from 15 schools was measured using noise dosimeters. The equivalent continuous noise level (Leq) of each teacher was recorded during single activities (classes) as well as for the entire day, and a normalized 8-hour exposure, termed the noise exposure level (Lex) was also computed. The measured Leq exceeded the 85-dBA limit for 78% of the teachers. Lex exceeded 85 dBA for 39% of the teachers. Limited recommendations on how to reduce the noise exposures are provided. The need for a hearing conservation program has also been emphasized.

Proceedings ArticleDOI
26 Aug 2002
TL;DR: Simulation experiments show that the new method proposed is an efficient and robust voice activity detector and has the same merit of tracking the noise spectrum properly as in Sohn's method.
Abstract: On the basis of the short-time energy of speech signals and the efficient method of noise statistics adaptation estimation proposed by Sohn et al.(1998), a new highly robust voice activity detection (VAD) rule for any kind of environmental noise is proposed in this paper. The accurate recognition rate of the new method is about five percent higher than that of Sohn's method on average, and also has the same merit of tracking the noise spectrum properly as in Sohn's method. Simulation experiments show that the new method is an efficient and robust voice activity detector.

01 Jan 2002
TL;DR: Ando et al. as mentioned in this paper investigated the characteristics of a flushing toilet noise in a bedroom on the downstairs floor in terms of the temporal and spatial factors extracted from the autocorrelation function and cross-correlation function based on the model of the human auditory-brain system.
Abstract: In a three-floored apartment, located in a quiet living area of Kobe, one resident was very annoyed by the flushing noise of an upstairs toilet that could be heard in the downstairs bedroom. The resident accused the construction company of improper construction, although the sound pressure level was only about 35 dBA. The purpose of this study is to clarify the characteristics of a flushing toilet noise in a bedroom on the downstairs floor in terms of the temporal and spatial factors extracted from the autocorrelation function and cross-correlation function based on the model of the human auditory-brain system [Ando, Y. (1998). Architectural Acoustics Blending Sound Sources, Sound Fields, and Listeners: AIP Press/Springer-Verlag, New York]. The results of the measurement showed that the temporal and spatial factors for the flushing toilet noise changed dramatically as a function of time. According to the human auditory-brain system, temporal factors of sound signals are processed by the left hemisphere, and spatial factors of sound signals are processed by the right hemisphere. The flushing noise of an upstairs toilet stimulates dramatically both the left and right hemispheres, which are sensitive to change in both the temporal and spatial factors.

Proceedings ArticleDOI
07 Aug 2002
TL;DR: The higher-order demixing performance, as measured by the Amari index, indicates that when the noise contamination exceeds the mixing contamination the ICA separation is reduced.
Abstract: Evaluates the performance of the extended-infomax independent component analysis (ICA) algorithm in a simulated biomedical blind source separation problem. Independent signals representing an alphawave and a heartbeat are generated and then mixed linearly in the presence of white or pink noise to simulate a one-minute recording of an electroencephalogram and electrocardiogram. The selected ICA algorithm separates the white and pink noises equally well. The maximum estimation signal-to-noise ratio of the source estimates is equivalent to the added noise level, so the separation is optimum to second-order. The higher-order demixing performance, as measured by the Amari index, indicates that when the noise contamination exceeds the mixing contamination the ICA separation is reduced. These results represent a lower bound to the performance of extended-infomax ICA in noisy, time-correlated electrophysiological conditions.

Proceedings ArticleDOI
13 May 2002
TL;DR: A sigmoidal gain function in for use in the analog system is introduced and a normalized SNR is also proposed for consistent noise suppression for different input noise variances.
Abstract: In this paper, we propose a continuous-time audio noise suppression algorithm for reducing stationary background noise in a single microphone signal. The algorithm is targeted for implementation in sub-threshold analog floating-gate transistor circuits for extremely low-power systems. The noise suppression algorithm divides the noisy signal into exponentially spaced sub-bands and estimates the noisy signal envelope and the noise envelope to calculate a time-varying gain for each sub-band based on each band's a posteriori SNR. A sigmoidal gain function in for use in the analog system is introduced and a normalized SNR is also proposed for consistent noise suppression for different input noise variances. Simulations of the system show promising results.

Proceedings ArticleDOI
13 May 2002
TL;DR: A frequency to eigendomain transformation is presented which provides a way to calculate a perceptually based upper bound for the residual noise and an easy way to generalize it to the colored noise case is provided.
Abstract: The major drawback of most noise reduction methods is what is known as musical noise. To cope with this problem, the masking properties of the human ear were used in the spectral subtraction methods. However, no similar approach is available for the signal subspace based methods. In a previous work, we presented a frequency to eigendomain transformation which provides a way to calculate a perceptually based upper bound for the residual noise. This bound, when used in the signal subspace approach, yields an improved result where better shaping of the residual noise is achieved. In this paper, we further improve this method and provide an easy way to generalize it to the colored noise case. Listening tests results are given to show the superiority of the proposed method.

PatentDOI
TL;DR: In this paper, a system and method for characterizing the contents of an input audio signal and for suppressing noise components of the input audio signals is described, where the audio signal is divided into a number of frequency domain input signals.
Abstract: A system and method for characterizing the contents of an input audio signal and for suppressing noise components of the input audio signal are described. The audio signal is divided into a number of frequency domain input signals. Each frequency domain input signal can be processed separately to determine its intensity change, modulation frequency, and time duration characteristics to characterize the frequency domain input signal as containing a desirable signal or as a type of noise. An index signal is calculated based on a combination of the determined characteristics and signals identified as noise are suppressed in comparison to signals identified as desirable to produce a set of frequency domain output signals with reduced noise. The frequency domain output signals are combined to provide an output audio signal corresponding to the input audio signal but having suppressed noise components and comparatively enhanced desirable signal components.

Patent
Cormac Herley1
22 Jun 2002
TL;DR: In this article, a repeating object extractor performs a computationally efficient joint segmentation and identification of the stream even in an environment where endpoints of audio objects are unknown or corrupted by voiceovers, cross-fading, or other noise.
Abstract: The “repeating object extractor” described herein automatically segments and identifies audio objects such as songs, station identifiers, jingles, advertisements, emergency broadcast signals, etc., in an audio stream. The repeating object extractor performs a computationally efficient joint segmentation and identification of the stream even in an environment where endpoints of audio objects are unknown or corrupted by voiceovers, cross-fading, or other noise. Parametric information is computed for a portion of the audio stream, followed by an initial comparison pass against a database of known audio objects to identify known audio objects which represent potential matches to the portion of the audio stream. A detailed comparison between the potential matches and the portion of the audio stream is then performed to confirm actual matches for segmenting and identifying the audio objects within the stream, an alignment between matches is then used to determine extents of audio objects in the audio stream.

Proceedings ArticleDOI
10 Dec 2002
TL;DR: The channel noise factor, which is the average ambient noise above the thermal noise at the antenna input has been found to vary between 12.6 to 21.5.
Abstract: This paper presents the results of the cumulative effect of impulsive radio noise measurements conducted in an indoor environment at 900 MHz and 1800 MHz. The studies are conducted on the three floors of a multi-storey office-cum-institute building. Several sets of measurements are taken on working days, when the electric devices are in ON state and also on holidays when electric devices are in OFF state. Cumulative radio noise level due to operation of electric devices including fluorescent tubes, are found significant at different frequencies. The channel noise factor, which is the average ambient noise above the thermal noise at the antenna input has been found to vary between 12.6 to 21.5.

Journal ArticleDOI
TL;DR: Noise levels in Greater Cairo were higher than those set by the Egyptian noise standards and policy to protect public health and welfare in residential areas, and a social survey carried out simultaneously indicated that 73.8% of respondent residents were highly or moderately irritated by road traffic noise.

Journal Article
TL;DR: The incidence of hearing symptoms seemed to correlate with increased noise dose and age, and the noisiest leisure-time activities were going to night-clubs and pubs, using home tools, playing in a band or orchestra, shooting and attending or participating in motor sports.
Abstract: The aim of this study was to determine a statistical measure for total weekly noise exposure from leisure-time noise activities in a Finnish urban adult population. The subjects´ time consumed in noisy activities, and their self-evaluated loudness of the activities converted into equivalent noise levels of the activities were used in the calculation of weekly noise exposure levels. Self-reported hearing symptoms (i.e., tinnitus, pain in the ear) and hearing loss due to noise exposure were also asked with the questionnaire. No measurements of sound level or hearing loss were made in this study. Forty-one per cent of subjects were estimated to be exposed to noise at levels over 75 dBA, and 9% of the subjects had weekly exposure that was over 85 dBA. The incidence of hearing symptoms seemed to correlate with increased noise dose and age. The noisiest leisure-time activities were going to night-clubs and pubs, using home tools, playing in a band or orchestra, shooting and attending or participating in motor sports.

PatentDOI
TL;DR: In this paper, an ultrasound system is described that includes a data processor within the system that determines the presence of aliasing and estimates noise levels from the spectral lines of Doppler data, and then automatically adjusts system parameters such as pulse repetition frequency (PRF), baseline shift, and spectrum orientation.
Abstract: An ultrasound system is disclosed that includes a method and apparatus to automatically adjust certain parameters that affect visualization of a Doppler spectral image. The ultrasound system acquires spectral lines of Doppler data generated by the ultrasound system. A data processor within the ultrasound system determines the presence of aliasing and estimates noise levels from the spectral lines of Doppler data. The data processor then automatically adjusts system parameters such as pulse repetition frequency (PRF), baseline shift, and spectrum orientation in response to aliasing and noise levels. The data processor of the ultrasound system also determines positive and negative signal boundaries for each spectral line of Doppler data and a display architecture processes the signal boundary data to display a spectral trace corresponding to the edges of the spectral lines.

Proceedings ArticleDOI
13 May 2002
TL;DR: Two different approaches to linear filtering the time sequences of compensated acoustic parameters are presented, based on canceling the distortion introduced into the probability distribution of acoustic parameters and uses the well-known technique of histogram equalization.
Abstract: The VTS approach for noise reduction is based on a statistical for mulation. It provides the expected value of the clean speech given the noisy observations and statistical models for the clean speech and the additive noise. The compensated signal is only an approximation of the clean one and retains a residual mismatch. The main objective of this work is to characterize this residual noise and to propose techniques to reduce its unwanted effects. Two different approaches to this problem are presented in this paper. The first one is based on linear filtering the time sequences of compensated acoustic parameters; for this purpose we use LDA-based RASTA-like FIR filters. The second approach is based on canceling the distortion introduced into the probability distribution of acoustic parameters and uses the well-known technique of histogram equalization. Results reported on AURORA database show that the proposed methods increase the recognition performance.

Journal Article
TL;DR: A comprehensive study on traffic noise level at twenty four pre-selected road transaction of Calcutta Metropolis was carried out during 1993-94, finding that traffic flow density as measured along with noise data recording were compared for establishing relationship with noise level.
Abstract: A comprehensive study on traffic noise level at twenty four pre-selected road transaction of Calcutta Metropolis was carried out during 1993-94. Noise levels were measured at each of twenty four sites, based on predetermined sampling interval and altogether 2880 observations were generated by recording data continuously for 24 hours. The L eq 24, exceedence levels, L D , L N , L DN , L NP and TNI were determined. Traffic flow density as measured along with noise data recording were then compared for establishing relationship with noise level. Finally the clustering of the sites were made based on variable viz. L eq 24 and traffic follow density.

Proceedings ArticleDOI
07 Aug 2002
TL;DR: A technique that uses a linear prediction error filter (LPEF) and an adaptive digital filter (ADF) to achieve noise reduction in a speech degraded by additive background noise is proposed.
Abstract: A technique that uses a linear prediction error filter (LPEF) and an adaptive digital filter (ADF) to achieve noise reduction in a speech degraded by additive background noise is proposed. Since a speech signal can be represented as the stationary signal over a short interval of time, most of speech signal can be predicted by the LPEF. On the other hand, when the input signal of the LPEF is a background noise, the prediction error signal becomes white. Assuming that the background noise is generated by exciting a linear system with a white noise, then we can reconstruct the background noise from the prediction error signal by estimating the transfer function of noise generation system. This estimation is performed by the ADF which is used as system identification. Noise reduction is achieved by subtracting the reconstructed noise from the speech degraded by additive background noise.

Patent
15 Aug 2002
TL;DR: In this article, a camera with audio noise attenuation capability is described, in which the camera comprises a microphone and an audio attenuation system configured to attenuate undesired noise captured by the microphone when recording audio.
Abstract: A camera having audio noise attenuation capability is disclosed. In one embodiment, the camera comprises a microphone and an audio noise attenuation system configured to attenuate undesired noise captured by the microphone when recording audio. In some embodiments, the audio noise attenuation system facilitates attenuation of noise generated by a motor of the camera. In other embodiments, the noise attenuation system generation facilitates attenuation of audio noise from the environment in which the camera is used.

Proceedings ArticleDOI
13 May 2002
TL;DR: Preliminary results are presented from an algorithm using time-varying adaptive filters that appears to be robust in the presence of white, Gaussian noise or a single competing speaker over a large range of signal-to-noise ratios.
Abstract: While many algorithms exist for accurate extraction of formant frequencies from a speech waveform, these algorithms are not typically shown to be robust in the presence of highly-transient background noise such as competing speech waveforms. Preliminary results are presented from an algorithm using time-varying adaptive filters that appears to be robust in the presence of white, Gaussian noise or a single competing speaker over a large range of signal-to-noise ratios (quiet to −6 dB). Use of a synthesized sentence, for which the actual formant frequencies are known, permits quantitative assessment of the algorithm's accuracy as a function of signal-to-noise ratio.

Journal ArticleDOI
TL;DR: Although experimental results indicate certain inconsistency between objective and subjective measures, it can still be concluded that spectral smoothing and comb filtering contribute to the melioration of speech quality for speech corrupted by additive white Gaussian noise.