scispace - formally typeset
Search or ask a question

Showing papers on "Background noise published in 2007"


Journal ArticleDOI
TL;DR: To evaluate the validity of different approaches to determine the signal‐to‐noise ratio (SNR) in MRI experiments with multi‐element surface coils, parallel imaging, and different reconstruction filters, a large number of experiments were conducted with single‐element coils and parallel imaging.
Abstract: Purpose To evaluate the validity of different approaches to determine the signal-to-noise ratio (SNR) in MRI experiments with multi-element surface coils, parallel imaging, and different reconstruction filters. Materials and Methods Four different approaches of SNR calculation were compared in phantom measurements and in vivo based on: 1) the pixel-by-pixel standard deviation (SD) in multiple repeated acquisitions; 2) the signal statistics in a difference image; and 3) and 4) the statistics in two separate regions of a single image employing either the mean value or the SD of background noise. Different receiver coil systems (with one and eight channels), acquisitions with and without parallel imaging, and five different reconstruction filters were compared. Results Averaged over all phantom measurements, the deviations from the reference value provided by the multiple-acquisitions method are 2.7% (SD 1.6%) for the difference method, 37.7% (25.9%) for the evaluation of the mean value of background noise, and 34.0% (38.1%) for the evaluation of the SD of background noise. Conclusion The conventionally determined SNR based on separate signal and noise regions in a single image will in general not agree with the true SNR measured in images after the application of certain reconstruction filters, multichannel reconstruction, or parallel imaging. J. Magn. Reson. Imaging 2007. © 2007 Wiley-Liss, Inc.

846 citations


Book ChapterDOI
01 Jun 2007
TL;DR: This chapter shows a comprehensive approximation to the main challenges in voice activity detection, the different solutions that have been reported in a complete review of the state of the art and the evaluation frameworks that are normally used.
Abstract: An important drawback affecting most of the speech processing systems is the environmental noise and its harmful effect on the system performance. Examples of such systems are the new wireless communications voice services or digital hearing aid devices. In speech recognition, there are still technical barriers inhibiting such systems from meeting the demands of modern applications. Numerous noise reduction techniques have been developed to palliate the effect of the noise on the system performance and often require an estimate of the noise statistics obtained by means of a precise voice activity detector (VAD). Speech/non-speech detection is an unsolved problem in speech processing and affects numerous applications including robust speech recognition (Karray and Marting, 2003; Ramirez et al. 2003), discontinuous transmission (ITU, 1996; ETSI, 1999), real-time speech transmission on the Internet (Sangwan et al., 2002) or combined noise reduction and echo cancellation schemes in the context of telephony (Basbug et al., 2004; Gustafsson et al., 2002). The speech/non-speech classification task is not as trivial as it appears, and most of the VAD algorithms fail when the level of background noise increases. During the last decade, numerous researchers have developed different strategies for detecting speech on a noisy signal (Sohn et al., 1999; Cho and Kondoz, 2001; Gazor and Zhang, 2003, Armani et al., 2003) and have evaluated the influence of the VAD effectiveness on the performance of speech processing systems (Bouquin-Jeannes and Faucon, 1995). Most of the approaches have focussed on the development of robust algorithms with special attention being paid to the derivation and study of noise robust features and decision rules (Woo et al., 2000; Li et al., 2002; Marzinzik and Kollmeier, 2002). The different VAD methods include those based on energy thresholds (Woo et al., 2000), pitch detection (Chengalvarayan, 1999), spectrum analysis (Marzinzik and Kollmeier, 2002), zero-crossing rate (ITU, 1996), periodicity measure (Tucker, 1992), higher order statistics in the LPC residual domain (Nemer et al., 2001) or combinations of different features (ITU, 1993; ETSI, 1999; Tanyer and Ozer, 2000). This chapter shows a comprehensive approximation to the main challenges in voice activity detection, the different solutions that have been reported in a complete review of the state of the art and the evaluation frameworks that are normally used. The application of VADs for speech coding, speech enhancement and robust speech recognition systems is shown and discussed. Three different VAD methods are described and compared to standardized and

256 citations


DOI
01 Jan 2007
TL;DR: Novel single- and multimicrophone speech dereverberation algorithms are developed that aim at the suppression of late reverberation, i.e., signal processing techniques to reduce the detrimental effects of reflections.
Abstract: In speech communication systems, such as voice-controlled systems, hands-free mobile telephones, and hearing aids, the received microphone signals are degraded by room reverberation, background noise, and other interferences This signal degradation may lead to total unintelligibility of the speech and decreases the performance of automatic speech recognition systems In the context of this work reverberation is the process of multi-path propagation of an acoustic sound from its source to one or more microphones The received microphone signal generally consists of a direct sound, reflections that arrive shortly after the direct sound (commonly called early reverberation), and reflections that arrive after the early reverberation (commonly called late reverberation) Reverberant speech can be described as sounding distant with noticeable echo and colouration These detrimental perceptual effects are primarily caused by late reverberation, and generally increase with increasing distance between the source and microphone Conversely, early reverberations tend to improve the intelligibility of speech In combination with the direct sound it is sometimes referred to as the early speech component Reduction of the detrimental effects of reflections is evidently of considerable practical importance, and is the focus of this dissertation More specifically the dissertation deals with dereverberation techniques, ie, signal processing techniques to reduce the detrimental effects of reflections In the dissertation, novel single- and multimicrophone speech dereverberation algorithms are developed that aim at the suppression of late reverberation, ie, at estimation of the early speech component This is done via so-called spectral enhancement techniques that require a specific measure of the late reverberant signal This measure, called spectral variance, can be estimated directly from the received (possibly noisy) reverberant signal(s) using a statistical reverberation model and a limited amount of a priori knowledge about the acoustic channel(s) between the source and the microphone(s) In our work an existing single-channel statistical reverberation model serves as a starting point The model is characterized by one parameter that depends on the acoustic characteristics of the environment We show that the spectral variance estimator that is based on this model, can only be used when the source-microphone distance is larger than the so-called critical distance This is, crudely speaking, the distance where the direct sound power is equal to the total reflective power A generalization of the statistical reverberation model in which the direct sound is incorporated is developed This model requires one additional parameter that is related to the ratio between the direct sound energy and the sound energy of all reflections The generalized model is used to derive a novel spectral variance estimator When the novel estimator is used for dereverberation rather than the existing estimator, and the source-microphone distance is smaller than the critical distance, the dereverberation performance is significantly increased Single-microphone systems only exploit the temporal and spectral diversity of the received signal Reverberation, of course, also induces spatial diversity To additionally exploit this diversity, multiple microphones must be used, and their outputs must be combined by a suitable spatial processor such as the so-called delay and sum beamformer It is not a priori evident whether spectral enhancement is best done before or after the spatial processor For this reason we investigate both possibilities, as well as a merge of the spatial processor and the spectral enhancement technique An advantage of the latter option is that the spectral variance estimator can be further improved Our experiments show that the use of multiple microphones affords a significant improvement of the perceptual speech quality The applicability of the theory developed in this dissertation is demonstrated using a hands-free communication system Since hands-free systems are often used in a noisy and reverberant environment, the received microphone signal does not only contain the desired signal but also interferences such as room reverberation that is caused by the desired source, background noise, and a far-end echo signal that results from a sound that is produced by the loudspeaker Usually an acoustic echo canceller is used to cancel the far-end echo Additionally a post-processor is used to suppress background noise and residual echo, ie, echo which could not be cancelled by the echo canceller In this work a novel structure and post-processor for an acoustic echo canceller are developed The post-processor suppresses late reverberation caused by the desired source, residual echo, and background noise The late reverberation and late residual echo are estimated using the generalized statistical reverberation model Experimental results convincingly demonstrate the benefits of the proposed system for suppressing late reverberation, residual echo and background noise The proposed structure and post-processor have a low computational complexity, a highly modular structure, can be seamlessly integrated into existing hands-free communication systems, and affords a significant increase of the listening comfort and speech intelligibility

239 citations


Journal ArticleDOI
TL;DR: Evidence of a behavioral change in sound production of right whales that is correlated with increased noise levels is provided and it is indicated that right whales may shift call frequency to compensate for increased band-limited background noise.
Abstract: The impact of anthropogenic noise on marine mammals has been an area of increasing concern over the past two decades. Most low-frequency anthropogenic noise in the ocean comes from commercial shipping which has contributed to an increase in ocean background noise over the past 150 years. The long-term impacts of these changes on marine mammals are not well understood. This paper describes both short- and long-term behavioral changes in calls produced by the endangered North Atlantic right whale (Eubalaena glacialis) and South Atlantic right whale (Eubalaena australis) in the presence of increased low-frequency noise. Right whales produce calls with a higher average fundamental frequency and they call at a lower rate in high noise conditions, possibly in response to masking from low-frequency noise. The long-term changes have occurred within the known lifespan of individual whales, indicating that a behavioral change, rather than selective pressure, has resulted in the observed differences. This study provides evidence of a behavioral change in sound production of right whales that is correlated with increased noise levels and indicates that right whales may shift call frequency to compensate for increased band-limited background noise.

238 citations


Journal ArticleDOI
TL;DR: In this paper, the authors cross-correlated ten hours of seismic background noise data acquired in a desert area and interpreted these coherent events as reflections, which align very well with reflections from an active survey at the same location.
Abstract: The retrieval of the earth's reflection response from cross?correlations of seismic noise recordings can provide valuable information, which may otherwise not be available due to limited spatial distribution of seismic sources. We cross?correlated ten hours of seismic background?noise data acquired in a desert area. The cross?correlation results show several coherent events, which align very well with reflections from an active survey at the same location. Therefore, we interpret these coherent events as reflections. Retrieving seismic reflections from background?noise measurements has a wide range of applications in regional seismology, frontier exploration and long?term monitoring of processes in the earth's subsurface.

213 citations


Journal ArticleDOI
TL;DR: The hypothesis that road traffic noise can mask a female's perception of male signals in the grey treefrog, Hyla chrysoscelis, is tested by comparing the effects of traffic noise and the background noise of a breeding chorus on female responses to advertisement calls.

211 citations


Journal ArticleDOI
TL;DR: A simple and effective post-processing technique to estimate echosounder background-noise levels and signal-to-noISE ratios (SNRs) during active pinging is developed, which provides repeated noise estimates over short intervals of time without user intervention, beneficial in cases where background noise changes over time.
Abstract: A simple and effective post-processing technique to estimate echosounder background-noise levels and signal-to-noise ratios (SNRs) during active pinging is developed. Similar to other methods of noise estimation during active pinging, this method assumes that some portion of the sampled acoustic signal is dominated by background noise, with a negligible contribution from the backscattered transmit signal. If this assumption is met, the method will provide robust and accurate estimates of background noise equivalent to that measured by the receiver if the transmitter were disabled. It provides repeated noise estimates over short intervals of time without user intervention, which is beneficial in cases where background noise changes over time. In situations where background noise is dominant in a portion of the recorded signal, it is straightforward to make first-order corrections for the effects of noise and to estimate the SNR to evaluate the effects of background noise on acoustic measurements. Noise correction and signal-to-noise-based thresholds have the potential to improve inferences from acoustic measurements in lower signal-to-noise situations, such as when surveying from noisy vessels, using multifrequency techniques, surveying at longer ranges, and when working with weak acoustic targets such as invertebrates and fish lacking swimbladders.

202 citations


Journal ArticleDOI
TL;DR: In this paper, the authors evaluated the benefit of the two-microphone adaptive beamformer BEAM in the Nucleus Freedom cochlear implant (CI) system for speech understanding in background noise by CI users.
Abstract: Objective:This paper evaluates the benefit of the two-microphone adaptive beamformer BEAM™ in the Nucleus Freedom™ cochlear implant (CI) system for speech understanding in background noise by CI users.Design:A double-blind evaluation of the two-microphone adaptive beamformer BEAM and a hardware dire

171 citations


Journal ArticleDOI
TL;DR: Singular value decomposition (SVD) is a coherency-based technique that provides both signal enhancement and noise suppression as discussed by the authors, which has been implemented in a variety of seismic applications, mostly on a global scale.
Abstract: Singular value decompositionSVD is a coherency-based technique that provides both signal enhancement and noise suppression. It has been implemented in a variety of seismic applications — mostly on a global scale. In this paper, we use SVD to improve the signal-to-noise ratio of unstacked and stacked seismic sections, but apply it locally to cope with coherent events that vary with both time and offset. The local SVD technique is compared with f-x deconvolution and median filtering on a set of synthetic and real-data sections. Local SVD is better than f-x deconvolution and median filtering in removing background noise, but it performs less well in enhancing weak events or events with conflicting dips. Combining f-x deconvolution or median filtering with local SVD overcomes the main weaknesses associated with each individual method and leads to the best results.

147 citations


28 Jan 2007

144 citations


Journal ArticleDOI
Sangkeun Lee1
TL;DR: The main advantage of the proposed algorithm enhances the details in the dark and the bright areas with low computations without boosting noise information and affecting the compressibility of the original image since it performs on the images in the compressed domain.
Abstract: The object of this paper is to present a simple and efficient algorithm for dynamic range compression and contrast enhancement of digital images under the noisy environment in the compressed domain. First, an image is separated into illumination and reflectance components. Next, the illumination component is manipulated adaptively for image dynamics by using a new content measure. Then, the reflectance component based on the measure of the spectral contents of the image is manipulated for image contrast. The spectral content measure is computed from the energy distribution across different spectral bands in a discrete cosine transform (DCT) block. The proposed approach also introduces a simple scheme for estimating and reducing noise information directly in the DCT domain. The main advantage of the proposed algorithm enhances the details in the dark and the bright areas with low computations without boosting noise information and affecting the compressibility of the original image since it performs on the images in the compressed domain. In order to evaluate the proposed scheme, several base-line approaches are described and compared using enhancement quality measures

Journal ArticleDOI
TL;DR: An extensive overview of the available estimators is presented, and a theoretical estimator is derived to experimentally assess an upper bound to the performance that can be achieved by any subspace-based method.
Abstract: The objective of this paper is threefold: (1) to provide an extensive review of signal subspace speech enhancement, (2) to derive an upper bound for the performance of these techniques, and (3) to present a comprehensive study of the potential of subspace filtering to increase the robustness of automatic speech recognisers against stationary additive noise distortions. Subspace filtering methods are based on the orthogonal decomposition of the noisy speech observation space into a signal subspace and a noise subspace. This decomposition is possible under the assumption of a low-rank model for speech, and on the availability of an estimate of the noise correlation matrix. We present an extensive overview of the available estimators, and derive a theoretical estimator to experimentally assess an upper bound to the performance that can be achieved by any subspace-based method. Automatic speech recognition (ASR) experiments with noisy data demonstrate that subspace-based speech enhancement can significantly increase the robustness of these systems in additive coloured noise environments. Optimal performance is obtained only if no explicit rank reduction of the noisy Hankel matrix is performed. Although this strategy might increase the level of the residual noise, it reduces the risk of removing essential signal information for the recogniser's back end. Finally, it is also shown that subspace filtering compares favourably to the well-known spectral subtraction technique.

Journal ArticleDOI
Stéphane Moreau1, Michel Roger1
TL;DR: In this paper, the authors compared two broadband noise mechanisms, the trailing edge noise or self-noise, and the leading-edge noise or turbulence-ingestion noise, in several blade technologies.
Abstract: This paper compares two broadband noise mechanisms, the trailing-edge noise or self-noise, and the leading-edge noise or turbulence-ingestion noise, in several blade technologies. Two previously developed analytical models for these broadband contributions are first validated with well-defined measurements on several airfoils embedded in an homogeneous flow at low-Mach number. Each instrumented airfoil is placed at the exit of an open jet anechoic wind tunnel with or without a grid generating turbulence upstream of it. Sound is measured in the far field at the same time as the wall-pressure fluctuations statistics close to the airfoil trailing edge and the inlet velocity fluctuation statistics impacting the airfoil leading edge. The models are then compared in some practical cases representative of airframes, wind turbines, and automotive engine cooling modules. The airfoil models of the two mechanisms are then extended to a full rotating machine in open space. The model predictions of both mechanisms are compared with in-flight helicopter measurements and automotive engine cooling modules measurements. In both instances, the turbulence-ingestion noise is found to be a dominant source over most of the frequency range. The self-noise only becomes a significant contributor at high angles of attack close to flow separation.

Journal ArticleDOI
TL;DR: LIMPIC preprocessing proves to be superior than other classical preprocessing techniques, allowing for a reliable decomposition of the background noise and the baseline drift from the MALDI-TOF mass spectra, and provides lower coefficient of variation associated with the peak intensity, improving the reliability of the information that can be extracted from single spectra.
Abstract: Mass spectrometry protein profiling is a promising tool for biomarker discovery in clinical proteomics. However, the development of a reliable approach for the separation of protein signals from noise is required. In this paper, LIMPIC, a computational method for the detection of protein peaks from linear-mode MALDI-TOF data is proposed. LIMPIC is based on novel techniques for background noise reduction and baseline removal. Peak detection is performed considering the presence of a non-homogeneous noise level in the mass spectrum. A comparison of the peaks collected from multiple spectra is used to classify them on the basis of a detection rate parameter, and hence to separate the protein signals from other disturbances. LIMPIC preprocessing proves to be superior than other classical preprocessing techniques, allowing for a reliable decomposition of the background noise and the baseline drift from the MALDI-TOF mass spectra. It provides lower coefficient of variation associated with the peak intensity, improving the reliability of the information that can be extracted from single spectra. Our results show that LIMPIC peak-picking is effective even in low protein concentration regimes. The analytical comparison with commercial and freeware peak-picking algorithms demonstrates its superior performances in terms of sensitivity and specificity, both on in-vitro purified protein samples and human plasma samples. The quantitative information on the peak intensity extracted with LIMPIC could be used for the recognition of significant protein profiles by means of advanced statistic tools: LIMPIC might be valuable in the perspective of biomarker discovery.

Journal ArticleDOI
TL;DR: This letter proposes a spatial variation to the traditional temporal framework that allows statistical motion detection with methods trained on one background frame instead of a series of frames as is usually the case.
Abstract: Most statistical background subtraction techniques are based on the analysis of temporal color/intensity distribution. However, learning statistics on a series of time frames can be problematic, especially when no frame absent of moving objects is available or when the available memory is not sufficient to store the series of frames needed for learning. In this letter, we propose a spatial variation to the traditional temporal framework. The proposed framework allows statistical motion detection with methods trained on one background frame instead of a series of frames as is usually the case. Our framework includes two spatial background subtraction approaches suitable for different applications. The first approach is meant for scenes having a nonstatic background due to noise, camera jitter or animation in the scene (e.g.,waving trees, fluttering leaves). This approach models each pixel with two PDFs: one unimodal PDF and one multimodal PDF, both trained on one background frame. In this way, the method can handle backgrounds with static and nonstatic areas. The second spatial approach is designed to use as little processing time and memory as possible. Based on the assumption that neighboring pixels often share similar temporal distribution, this second approach models the background with one global mixture of Gaussians.

Journal ArticleDOI
TL;DR: The technical feasibility of a transparent, sound-absorbing panel for outdoor antinoise devices is investigated and an optimized three-layer configuration can achieve sound- absorption properties similar to nontransparent products with only a limited loss of visual transparency and appropriate mechanical strength.
Abstract: Sound absorption and optical transparency are among the most useful properties of noise barriers. While the latter is required to reduce visual impact and for aesthetical reasons, the former is required whenever conditions of multiple reflections and presence of close, high receivers occur. The technical feasibility of a transparent, sound-absorbing panel for outdoor antinoise devices is investigated in this paper. An analysis of acoustical performance of multiple perforated plates is performed employing an existing theory for microperforated absorbers under normal incidence and diffused sound field. An optimization of the geometrical parameters is carried out on the basis of the European classification criteria of noise barriers for roadways. An optimized three-layer configuration can achieve sound-absorption properties similar to nontransparent products with only a limited loss of visual transparency and appropriate mechanical strength. Experimental data obtained with an impedance tube on small test sam...

Journal ArticleDOI
TL;DR: This study examined the source for this SBN‐induced masking effect of the blood oxygenation level‐dependent (BOLD) response by directly comparing two experimental sessions with the same auditory stimulation, which was presented either with or without recorded scanner background noise (RecSBN).
Abstract: Several studies reported decreased signal intensities within auditory areas for experimental designs employing continuous scanner background noise (SBN) in comparison to designs with less or no SBN. This study examined the source for this SBN-induced masking effect of the blood oxygenation level-dependent (BOLD) response by directly comparing two experimental sessions with the same auditory stimulation, which was presented either with or without recorded scanner background noise (RecSBN). Ten subjects listened to a series of four one-syllable words and had to decide whether two of the words were identical. The words were either presented with a silent background or with added RecSBN. This was then contrasted with either silence or RecSBN. A sparse temporal sampling method was used in both sessions, which enabled us to directly assess the influence of RecSBN without varying scanning parameters, acquisition quantities, or auditory stimulations. Our results suggest that previously reported SBN-induced masking of the BOLD response in experimental designs with SBN might be caused by an interaction between increased baseline levels and nonlinearity effects within auditory cortices. Adding SBN to an experimental condition does not enhance signal intensities to the same degree that SBN does when presented with a silent background, and therefore contrasting an experimental and baseline condition that both have SBN may lead to signal decreases. In addition, our study shows this effect is greatest in Heschl's gyrus, but can also be observed in higher-order auditory areas.

Journal ArticleDOI
TL;DR: The small reduction in scores for amplitude-modulated compared to steady noise and lack of age interaction suggests that the substantial deficit seen with age in multitalker babble for previous studies was due to some effect not elicited here, such as informational masking.
Abstract: The extent to which audibility determines speech recognition depends on a number of signal and listener factors. This study focused on three factors: age, background noise modulation, and linear versus wide-dynamic compression amplification. Three audiometrically matched groups of older listeners with hearing loss were tested to determine at what age performance declined relative to that expected on the basis of audibility. Recognition fell below predicted scores by greater amounts as age increased. Scores were higher for steady versus amplitude-modulated noise. Scores for WDRC-amplified speech were slightly lower than for linearly amplified speech across all groups and noise conditions. We found no interaction between age and type of noise. The small reduction in scores for amplitude-modulated compared to steady noise and lack of age interaction suggests that the substantial deficit seen with age in multitalker babble for previous studies was due to some effect not elicited here, such as informational masking.

Proceedings ArticleDOI
04 Dec 2007
TL;DR: A new approach to sparseness-based BSS based on the EM algorithm, which iteratively estimates the DOA and the time-frequency mask for each source through the EM algorithms under the sparsness assumption is proposed.
Abstract: In this paper, we propose a new approach to sparseness-based BSS based on the EM algorithm, which iteratively estimates the DOA and the time-frequency mask for each source through the EM algorithm under the sparseness assumption. Our method has the following characteristics: 1) it enables the introduction of physical observation models such as the diffuse sound field, because the likelihood is defined in the original signal domain and not in the feature domain, 2) one does not necessarily have to know in advance the power of the background noise since they are also parameters which can be estimated from the observed signal, 3) it takes short computational time, 4) a common objective function is iteratively increased in localization and separation steps, which correspond to the E-step and M-step, respectively. Although our framework is applicable to general N channel BSS, we will concentrate on the formulation of the problem in the particular case where two sensory inputs are available, and we show some numerical simulation results.

Patent
Che-Ming Lin1
15 Nov 2007
TL;DR: In this paper, a plurality of microphones, a sound inspecting unit, a direction estimating unit, and a background noise removing unit are used to collect sounds around a user, and an alerting unit is used to inform the user of the detected sound via an alert message.
Abstract: An apparatus for detecting sound includes a plurality of microphones, a sound inspecting unit, a direction estimating unit, a background noise removing unit, and an alerting unit. The microphones are used to collect sounds around a user. The sound inspecting unit is used to calculate the feature values of a background noise within a preset time interval, and to determine if a latest collected sound satisfies a preset condition. When the preset condition is satisfied, the direction estimating unit is used to estimate the occurrence direction of the latest collected sound, and to determine if the occurrence direction is within a preset range behind the user. When the preset range is satisfied, the background noise removing unit is used to remove the background noise in the latest collected sound so as to obtain a detected sound. The alerting unit is used to inform the user of the detected sound via an alert message. A method for detecting sound is also disclosed.

Journal ArticleDOI
TL;DR: Comparisons between the habitat noise types presented here and prior data on auditory masking indicate that fishes with enhanced hearing abilities are only moderately masked in stagnant, quiet habitats, whereas they would be considerably masked in fast-flowing habitats.
Abstract: The detectability of acoustic signals depends on the hearing abilities of receivers and the prevailing ambient noise in a given habitat. Ambient noise is inherent in all terrestrial and aquatic habitats and has the potential to severely mask relevant acoustic signals. In order to assess the detectability of sounds to fishes, the linear equivalent sound pressure levels (LLeq) of twelve European freshwater habitats were measured and spectra of the ambient noise recordings analyzed. Stagnant habitats such as lakes and backwaters are quiet, with noise levels below 100dB re 1μPa (LLeq) under no-wind conditions. Typically, most environmental noise is concentrated in the lower frequency range below 500Hz. Noise levels in fast-flowing waters were typically above 110dB and peaked at 135dB (Danube River in a free-flowing area). Contrary to stagnant habitats, high amounts of sound energy were present in the high frequency range above 1kHz, leaving a low-energy “noise window” below 1kHz. Comparisons between the habit...

Journal ArticleDOI
TL;DR: This work jointly model the dynamics of both the raw speech signal and the noise, using a switching linear dynamical system (SLDS), which is comparatively noise robust and also significantly outperforms a state-of-the-art feature-based HMM.
Abstract: Real world applications such as hands-free dialling in cars may have to deal with potentially very noisy environments. Existing state-of-the-art solutions to this problem use feature-based HMMs, with a preprocessing stage to clean the noisy signal. However, the effect that raw signal noise has on the induced HMM features is poorly understood, and limits the performance of the HMM system. An alternative to feature-based HMMs is to model the raw signal, which has the potential advantage that including an explicit noise model is straightforward. Here we jointly model the dynamics of both the raw speech signal and the noise, using a switching linear dynamical system (SLDS). The new model was tested on isolated digit utterances corrupted by Gaussian noise. Contrary to the autoregressive HMM and its derivatives, which provides a model of uncorrupted raw speech, the SLDS is comparatively noise robust and also significantly outperforms a state-of-the-art feature-based HMM. The computational complexity of the SLDS scales exponentially with the length of the time series. To counter this we use expectation correction which provides a stable and accurate linear-time approximation for this important class of models, aiding their further application in acoustic modeling.

Journal ArticleDOI
TL;DR: Algorithm for detecting pedestrians in videos acquired by infrared sensors based on gait is described, which converts the cyclic pattern into a binary sequence by Maximal Principal Gait Angle (MPGA) fitting in the second method.
Abstract: We describe algorithms for detecting pedestrians in videos acquired by infrared (and color) sensors. Two approaches are proposed based on gait. The first employs computationally efficient periodicity measurements. Unlike other methods, it estimates a periodic motion frequency using two cascading hypothesis testing steps to filter out non-cyclic pixels so that it works well for both radial and lateral walking directions. The extraction of the period is efficient and robust with respect to sensor noise and cluttered background. In order to integrate shape and motion, we convert the cyclic pattern into a binary sequence by Maximal Principal Gait Angle (MPGA) fitting in the second method. It does not require alignment and continuously estimates the period using a Phase-locked Loop. Both methods are evaluated by experimental results that measure performance as a function of size, movement direction, frame rate and sequence length.

Journal ArticleDOI
Li Xu1, Yunfang Zheng
TL;DR: There was a trade-off between temporal and spectral cues for phoneme recognition in noise and there was no further improvement in performance for consonant recognition when the number of channels was > or =12 in any of the three conditions.
Abstract: Cochlear implant users receive limited spectral and temporal information. Their speech recognition deteriorates dramatically in noise. The aim of the present study was to determine the relative contributions of spectral and temporal cues to speech recognition in noise. Spectral information was manipulated by varying the number of channels from 2 to 32 in a noise-excited vocoder. Temporal information was manipulated by varying the low-pass cutoff frequency of the envelope extractor from 1to512Hz. Ten normal-hearing, native speakers of English participated in tests of phoneme recognition using vocoder processed consonants and vowels under three conditions (quiet, and +6 and 0dB signal-to-noise ratios). The number of channels required for vowel-recognition performance to plateau increased from 12 in quiet to 16–24 in the two noise conditions. However, for consonant recognition, no further improvement in performance was evident when the number of channels was ⩾12 in any of the three conditions. The contributi...

Journal ArticleDOI
TL;DR: MRAN's performance is compared with the conventional MUSIC algorithm and also the radial basis function neural network scheme developed by A. H. El Zooghby under normal and failed cases and results indicate the superior performance of MRAN based DoA estimation scheme.
Abstract: This paper presents the use of a minimal resource allocation network (MRAN) for the direction of arrival (DoA) estimation under array sensor failure in a noisy environment. MRAN is a sequential learning algorithm in which the number of hidden neurons are added or removed based on the input data and produces a compact network. The training for MRAN is done under no failure and no noise case and the trained network is then used when there is a failure. Thus, the need for knowing the element and the time of its failure, as required in other methods is eliminated. MRAN's performance is compared with the conventional MUSIC algorithm and also the radial basis function neural network scheme developed by A. H. El Zooghby under normal and failed cases. In normal case, different antenna effects like mutual coupling, nonuniform array and unequal source power have been studied under different signal to noise ratio (SNR) values. Results indicate the superior performance of MRAN based DoA estimation scheme under different antenna effects, failure conditions and noise levels

Patent
22 Jun 2007
TL;DR: In this paper, a system for facilitating conversational communications in an environment with background noise, including a microphone for sensing the background noise and a signal processor configured to process the microphone output and produce an anti-noise electrical output, was presented.
Abstract: A system for facilitating conversational communications in an environment with background noise, the system including a microphone for sensing the background noise, a signal processor configured to process the microphone output and produce an anti-noise electrical output, and a directional speaker array configured to receive the anti-noise electrical output and directionally broadcast anti-noise audio output, the anti noise audio output destructively interfering with the environmental background noise.

Journal ArticleDOI
TL;DR: An analysis of the transmission of intracellular signals from neurons to an extracellular electrode, and a set of MATLAB functions based on this analysis that generate realistic but controllable synthetic signals that can be used to generate realistic (non-Gaussian) background noise.

Journal ArticleDOI
TL;DR: Experimental results demonstrate the advantage of using the proposed simultaneous detection and estimation approach with the proposed a priori SNR estimator, which facilitate suppression of transient noise with a controlled level of speech distortion.
Abstract: In this paper, we present a simultaneous detection and estimation approach for speech enhancement. A detector for speech presence in the short-time Fourier transform domain is combined with an estimator, which jointly minimizes a cost function that takes into account both detection and estimation errors. Cost parameters control the tradeoff between speech distortion, caused by missed detection of speech components and residual musical noise resulting from false-detection. Furthermore, a modified decision-directed a priori signal-to-noise ratio (SNR) estimation is proposed for transient-noise environments. Experimental results demonstrate the advantage of using the proposed simultaneous detection and estimation approach with the proposed a priori SNR estimator, which facilitate suppression of transient noise with a controlled level of speech distortion.

Journal ArticleDOI
TL;DR: An adaptive nonlinear filtering approach in the orthogonal transform domain is proposed and analyzed for several typical noise environments in the DCT domain and is found to be competing with the state-of-the-art methods on pure additive noise corrupted images.
Abstract: This work addresses the problem of signal-dependent noise removal in images. An adaptive nonlinear filtering approach in the orthogonal transform domain is proposed and analyzed for several typical noise environments in the DCT domain. Being applied locally, that is, within a window of small support, DCT is expected to approximate the Karhunen-Loeve decorrelating transform, which enables effective suppression of noise components. The detail preservation ability of the filter allowing not to destroy any useful content in images is especially emphasized and considered. A local adaptive DCT filtering for the two cases, when signal-dependent noise can be and cannot be mapped into additive uncorrelated noise with homomorphic transform, is formulated. Although the main issue is signal-dependent and pure multiplicative noise, the proposed filtering approach is also found to be competing with the state-of-the-art methods on pure additive noise corrupted images.

Journal ArticleDOI
TL;DR: A multidimensional analysis indicates that several dimensions are needed to describe the factors used by subjects to judge the effects of the three distortion types, and predicts the quality judgments of normal-hearing listeners and listeners with mild-to-moderate hearing loss.
Abstract: Noise and distortion reduce speech intelligibility and quality in audio devices such as hearing aids. This study investigates the perception and prediction of sound quality by both normal-hearing and hearing-impaired subjects for conditions of noise and distortion related to those found in hearing aids. Stimuli were sentences subjected to three kinds of distortion (additive noise, peak clipping, and center clipping), with eight levels of degradation for each distortion type. The subjects performed paired comparisons for all possible pairs of 24 conditions. A one-dimensional coherence-based metric was used to analyze the quality judgments. This metric was an extension of a speech intelligibility metric presented in Kates and Arehart (2005) [J. Acoust. Soc. Am. 117, 2224–2237] and is based on dividing the speech signal into three amplitude regions, computing the coherence for each region, and then combining the three coherence values across frequency in a calculation based on the speech intelligibility inde...