scispace - formally typeset
Search or ask a question

Showing papers on "Noise published in 2019"


Proceedings ArticleDOI
12 May 2019
TL;DR: Experiments suggest that training with large amounts of noisy data can outperform training with smaller amounts of carefully-labeled data, and it is shown that noise-robust loss functions can be effective in improving performance in presence of corrupted labels.
Abstract: As sound event classification moves towards larger datasets, issues of label noise become inevitable. Web sites can supply large volumes of user-contributed audio and metadata, but inferring labels from this metadata introduces errors due to unreliable inputs, and limitations in the mapping. There is, however, little research into the impact of these errors. To foster the investigation of label noise in sound event classification we present FSDnoisy18k, a dataset containing 42.5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data. We characterize the label noise empirically, and provide a CNN baseline system. Experiments suggest that training with large amounts of noisy data can outperform training with smaller amounts of carefully-labeled data. We also show that noise-robust loss functions can be effective in improving performance in presence of corrupted labels.

83 citations


Journal ArticleDOI
TL;DR: Despite past controversies, increasing evidence has led to acceptance that white matter activity is detectable using functional magnetic resonance imaging (fMRI), and advanced analytic methods continue to be published that reinforce a historic bias against white matter activation by using it as a nuisance regressor.
Abstract: Despite past controversies, increasing evidence has led to acceptance that white matter activity is detectable using functional magnetic resonance imaging (fMRI). In spite of this, advanced analytic methods continue to be published that reinforce a historic bias against white matter activation by using it as a nuisance regressor. It is important that contemporary analyses overcome this blind spot in whole brain functional imaging, both to ensure that newly developed noise regression techniques are accurate, and to ensure that white matter, a vital and understudied part of the brain, is not ignored in functional neuroimaging studies.

69 citations


Journal ArticleDOI
TL;DR: It is demonstrated that interruption of functional behaviors in some cases coincides with high‐level vessel noise, and concomitant long‐term continuous broadband on‐animal sound and movement recordings may be an important tool in future quantification of disturbance effects of anthropogenic activities at sea and assessment of long-term population impacts on pinnipeds.
Abstract: The impact of anthropogenic noise on marine fauna is of increasing conservation concern with vessel noise being one of the major contributors. Animals that rely on shallow coastal habitats may be especially vulnerable to this form of pollution.Very limited information is available on how much noise from ship traffic individual animals experience, and how they may react to it due to a lack of suitable methods. To address this, we developed long-duration audio and 3D-movement tags (DTAGs) and deployed them on three harbor seals and two gray seals in the North Sea during 2015-2016.These tags recorded sound, accelerometry, magnetometry, and pressure continuously for up to 21 days. GPS positions were also sampled for one seal continuously throughout the recording period. A separate tag, combining a camera and an accelerometer logger, was deployed on two harbor seals to visualize specific behaviors that helped interpret accelerometer signals in the DTAG data.Combining data from depth, accelerometer, and audio sensors, we found that animals spent 6.6%-42.3% of the time hauled out (either on land or partly submerged), and 5.3%-12.4% of their at-sea time resting at the sea bottom, while the remaining time was used for traveling, resting at surface, and foraging. Animals were exposed to audible vessel noise 2.2%-20.5% of their time when in water, and we demonstrate that interruption of functional behaviors (e.g., resting) in some cases coincides with high-level vessel noise. Two-thirds of the ship noise events were traceable by the AIS vessel tracking system, while one-third comprised vessels without AIS.This preliminary study demonstrates how concomitant long-term continuous broadband on-animal sound and movement recordings may be an important tool in future quantification of disturbance effects of anthropogenic activities at sea and assessment of long-term population impacts on pinnipeds.

34 citations


Proceedings ArticleDOI
06 Nov 2019
TL;DR: This paper proposes a method to detect audio adversarial examples by adding a new low level distortion using audio modification, so that the classification result of the adversarial example changes sensitively.
Abstract: Deep neural networks (DNNs) perform well in the fields of image recognition, speech recognition, pattern analysis, and intrusion detection. However, DNNs are vulnerable to adversarial examples that add a small amount of noise to the original samples. These adversarial examples have mainly been studied in the field of images, but their effect on the audio field is currently of great interest. For example, adding small distortion that is difficult to identify by humans to the original sample can create audio adversarial examples that allow humans to hear without errors, but only to misunderstand the machine. Therefore, a defense method against audio adversarial examples is needed because it is a threat in this audio field. In this paper, we propose a method to detect audio adversarial examples. The key point of this method is to add a new low level distortion using audio modification, so that the classification result of the adversarial example changes sensitively. On the other hand, the original sample has little change in the classification result for low level distortion. Using this feature, we propose a method to detect audio adversarial examples. To verify the proposed method, we used the Mozilla Common Voice dataset and the DeepSpeech model as the target model. Based on the experimental results, it was found that the accuracy of the adversarial example decreased to 6.21% at approximately 12 dB. It can detect the audio adversarial example compared to the initial audio sample.

29 citations


Journal ArticleDOI
TL;DR: The entire network infrastructure is outlined, including the operation of the sensors, followed by an analysis of its data yield and the development of the fault detection approach and the future system integration plans for this.
Abstract: Noise pollution is one of the topmost quality of life issues for urban residents in the United States. Continued exposure to high levels of noise has proven effects on health, including acute effects such as sleep disruption, and long-term effects such as hypertension, heart disease, and hearing loss. To investigate and ultimately aid in the mitigation of urban noise, a network of 55 sensor nodes has been deployed across New York City for over two years, collecting sound pressure level (SPL) and audio data. This network has cumulatively amassed over 75 years of calibrated, high-resolution SPL measurements and 35 years of audio data. In addition, high frequency telemetry data has been collected that provides an indication of a sensors' health. This telemetry data was analyzed over an 18 month period across 31 of the sensors. It has been used to develop a prototype model for pre-failure detection which has the ability to identify sensors in a prefail state 69.1% of the time. The entire network infrastructure is outlined, including the operation of the sensors, followed by an analysis of its data yield and the development of the fault detection approach and the future system integration plans for this.

23 citations


Journal ArticleDOI
TL;DR: Children with hearing loss were able to derive a substantial benefit for listening in fluctuating noise when measured in instrumental music compared to 2-talker babble, and speech recognition is more sensitive to the effects of hearing loss when measuredIn fluctuating compared to steady-state noise.
Abstract: Purpose Speech recognition deteriorates with hearing loss, particularly in fluctuating background noise. This study examined how hearing loss affects speech recognition in different types of noise to clarify how characteristics of the noise interact with the benefits listeners receive when listening in fluctuating compared to steady-state noise. Method Speech reception thresholds were measured for a closed set of spondee words in children (ages 5–17 years) in quiet, speech-spectrum noise, 2-talker babble, and instrumental music. Twenty children with normal hearing and 43 children with hearing loss participated; children with hearing loss were subdivided into groups with cochlear implant (18 children) and hearing aid (25 children) groups. A cohort of adults with normal hearing was included for comparison. Results Hearing loss had a large effect on speech recognition for each condition, but the effect of hearing loss was largest in 2-talker babble and smallest in speech-spectrum noise. Children with normal ...

21 citations


Journal ArticleDOI
31 Jul 2019-PLOS ONE
TL;DR: Although noise did not affect intruder detection, noise affected some aspects of singing and aggressive responses, which may be related to the challenge of discriminating and assessing territorial threats under elevated noise.
Abstract: Anthropogenic noise decreases signal active space, or the area over which male bird song can be detected in the environment. For territorial males, noise may make it more difficult to detect and assess territorial challenges, which in turn may increase defense costs and influence whether males maintain territory ownership. We tested the hypothesis that noise affects the ability of male house wrens (Troglodytes aedon) near active nests to detect intruders and alters responses to them. We broadcast pre-recorded male song and pink noise on territories to simulate intrusions with and without noise, as well as to noise alone. We measured detection by how long males took to sing or approach the speaker after the start of a playback. To measure whether playbacks changed male behavior, we compared their vocal responses before and during treatments, as well as compared mean vocal responses and the number of flyovers and attacks on the speaker during treatments. Noise did not affect a male’s ability to detect an intruder on his territory. Males altered their responses to simulated intruders with and without noise compared to the noise-only treatment by singing longer songs at faster rates. Males increased peak frequency of songs during intrusions without noise compared to noise-only treatments, but frequency during intruder plus noise treatments did not differ from either. When confronting simulated intruders in noise, males increased the number of attacks on the speaker compared to intruders without noise, possibly because they were less able to assess intruders via songs and relied on close encounters for information. Although noise did not affect intruder detection, noise affected some aspects of singing and aggressive responses, which may be related to the challenge of discriminating and assessing territorial threats under elevated noise.

19 citations


Journal ArticleDOI
TL;DR: The field test demonstrated the successful measurement of high-level impulse waveforms with the on-body and in-ear recording system, and the device worked as intended in terms of hearing protection and noise dosimetry.
Abstract: Accurate quantification of noise exposure in military environments is challenging due to movement of listeners and noise sources, spectral and temporal noise characteristics, and varied use of hearing protection. This study evaluates a wearable recording device designed to measure on-body and in-ear noise exposure, specifically in an environment with significant impulse noise resulting from firearms. A commercial audio recorder was augmented to obtain simultaneous measurements inside the ear canal behind an integrated hearing protector, and near the outer ear. Validation measurements, conducted with an acoustic test fixture and shock tube, indicated high impulse peak insertion loss with a proper fit of the integrated hearing protector. The recording devices were worn by five subjects during a live-fire data collection at Marine Corps Base Quantico where Marines fired semi-automatic rifles. The field test demonstrated the successful measurement of high-level impulse waveforms with the on-body and in-ear recording system. Dual channels allowed for instantaneous fit estimates for the hearing protection component, and the device worked as intended in terms of hearing protection and noise dosimetry. Accurate measurements of noise exposure and hearing protector fit should improve the ability to model and assess the risks of noise-induced hearing loss.

19 citations


Journal ArticleDOI
TL;DR: A novel noise PSD estimation algorithm based on minimum mean-square error (MMSE) is proposed, which exhibits more excellent noise tracking capability under various nonstationary noise environments and SNR conditions.

18 citations


Journal ArticleDOI
Yusuke Hioka1, Michael Kingan1, Gian Schmid1, Ryan McKay1, Karl Stol1 
TL;DR: Results of subjective listening tests suggest the quality of the recording made by the designed UAV mounted system for recording sound from a targeted source or direction is significantly better than that of the Recording by the shotgun microphone.

17 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a method for spectral decomposition of multispectral and hyperspectral images to remove stripes from the scene, followed by a combination of spectral and spatial smoothing to further increase the SNR and remove non-Lambertian features.

Journal ArticleDOI
TL;DR: In this article, the authors analyzed variations in cyclists' levels of noise exposure in Ho Chi Minh City (Vietnam) by integrating three dimensions: that is, the characteristics of the trip, neighbourhood effects, and the temporal dimension.

Journal ArticleDOI
22 Mar 2019-Sensors
TL;DR: In this paper, a network of 55 sensor nodes has been deployed across New York City for over two years, collecting sound pressure level (SPL) and audio data, which has been used to develop a prototype model for pre-failure detection which has the ability to identify sensors in a prefail state 69.1% of the time.
Abstract: Noise pollution is one of the topmost quality of life issues for urban residents in the United States. Continued exposure to high levels of noise has proven effects on health, including acute effects such as sleep disruption, and long-term effects such as hypertension, heart disease, and hearing loss. To investigate and ultimately aid in the mitigation of urban noise, a network of 55 sensor nodes has been deployed across New York City for over two years, collecting sound pressure level (SPL) and audio data. This network has cumulatively amassed over 75 years of calibrated, high-resolution SPL measurements and 35 years of audio data. In addition, high frequency telemetry data have been collected that provides an indication of a sensors' health. These telemetry data were analyzed over an 18-month period across 31 of the sensors. It has been used to develop a prototype model for pre-failure detection which has the ability to identify sensors in a prefail state 69.1% of the time. The entire network infrastructure is outlined, including the operation of the sensors, followed by an analysis of its data yield and the development of the fault detection approach and the future system integration plans for this.

Journal ArticleDOI
TL;DR: In this paper, the research problem for optimizing the audio steganography technique is laid down, a methodology is proposed that effectively resolves the stated research problem and the implementation results are analyzed to ensure the effectiveness of the given solution.
Abstract: Being easy to understand and simple to implement, substitution technique of performing steganography has gain wide popularity among users as well as attackers. Steganography is categorized into different types based on the carrier file being used for embedding data. The audio file is focused on hiding data in this paper. Human has associated an acute degree of sensitivity to additive random noise. An individual is able to detect noise in an audio file as low as one part in 10 million. Given this limitation, it seems that concealing information within audio files would be a pointless exercise. Human auditory system (HAS) experiences an interesting behavior known as masking effect, which says that the threshold of hearing of one type of sound is affected by the presence of another type of sound. Because of this property, it is possible to hide some data inside an audio file without being noticed. In this paper, the research problem for optimizing the audio steganography technique is laid down. In the end, a methodology is proposed that effectively resolves the stated research problem and finally the implementation results are analyzed to ensure the effectiveness of the given solution.

Journal ArticleDOI
25 Mar 2019-Fractals
TL;DR: A novel audio magnetotelluric (AMT) signal-noise identification and separation method based on multifractal spectrum and matching pursuit that can effectively identify interference in the EMTF mathematical model and measured AMT data is proposed.
Abstract: To avoid the blindness of the overall de-noising method and retain useful low frequency signals that are not over processed, we proposed a novel audio magnetotelluric (AMT) signal-noise identificat...

Proceedings ArticleDOI
01 Nov 2019
TL;DR: This research proposes to simulate the removal of equipment and environmental noise in an audio using Digital Lowpass Chebyshev type II filter, which exhibits the lowest value of mean and median error.
Abstract: Audio signal is one of the most used signals especially in the development of communication technology. One common audio signal is a human voice recording. In such signal, there is a segment called unvoiced recording. These segments were usually noise produced in the background or in the device used. This research proposes to simulate the removal of equipment and environmental noise in an audio using Digital Lowpass Chebyshev type II filter. Audio signal samples were gathered using a voice recorder device. The saved file will be filter using GNU Octave software. Then, the processed sound signals are amplified for higher power. The filter designed lowpass filter was expected to attenuate the passband in the frequency of 200 Hz with 10 dB and the stopband in the frequency of 250 Hz, 275 Hz, and 300 Hz with 40 dB. Among the three different specifications of the filter, the filter that has a stopband frequency of 300 Hz exhibits the lowest value of mean and median error.

Posted Content
TL;DR: A data-driven approach for predicting the behavior of a given non-linear audio signal processing effect (henceforth "audio effect") using a deep auto-encoder model that is conditioned on both time-domain samples and the control parameters of the target audio effect.
Abstract: In this work we present a data-driven approach for predicting the behavior of (i.e., profiling) a given non-linear audio signal processing effect (henceforth "audio effect"). Our objective is to learn a mapping function that maps the unprocessed audio to the processed by the audio effect to be profiled, using time-domain samples. To that aim, we employ a deep auto-encoder model that is conditioned on both time-domain samples and the control parameters of the target audio effect. As a test-case study, we focus on the offline profiling of two dynamic range compression audio effects, one software-based and the other analog. Compressors were chosen because they are a widely used and important set of effects and because their parameterized nonlinear time-dependent nature makes them a challenging problem for a system aiming to profile "general" audio effects. Results from our experimental procedure show that the primary functional and auditory characteristics of the compressors can be captured, however there is still sufficient audible noise to merit further investigation before such methods are applied to real-world audio processing workflows.

Journal ArticleDOI
TL;DR: A three-way spectral decomposition quantifies the contribution of turbulent mixing noise and broadband shock-associated noise to high-performance military aircraft noise.
Abstract: High-performance military aircraft noise contains large- and fine-scale turbulent mixing noise and broadband shock-associated noise. A three-way spectral decomposition quantifies the contribution f...

Journal ArticleDOI
TL;DR: Closing patient doors was most effective in noise reduction, increasing patient unit satisfaction, and the use of visual cues and staff education are effective in reducing noise levels.
Abstract: Purpose:To determine whether using existing noise reduction strategies improves patients’ overall satisfaction level during hospitalization on an adult outpatient cardiology unit and to assess whet...

Journal ArticleDOI
TL;DR: This work develops a generalization of the leaky stochastic accumulator model using a Langevin equation whose non-linear noise term allows for varying levels of autocorrelation in the time course of the decision variable.
Abstract: Integration-to-bound models are among the most widely used models of perceptual decision-making due to their simplicity and power in accounting for behavioral and neurophysiological data. They involve temporal integration over an input signal (“evidence”) plus Gaussian white noise. However, brain data shows that noise in the brain is long-term correlated, with a spectral density of the form 1/fα (with typically 1 < α < 2), also known as pink noise or ‘1/f’ noise. Surprisingly, the adequacy of the spectral properties of drift-diffusion models to electrophysiological data has received little attention in the literature. Here we propose a model of accumulation of evidence for decision-making that takes into consideration the spectral properties of brain signals. We develop a generalization of the leaky stochastic accumulator model using a Langevin equation whose non-linear noise term allows for varying levels of autocorrelation in the time course of the decision variable. We derive this equation directly from magnetoencephalographic data recorded while subjects performed a spontaneous movement-initiation task. We then propose a nonlinear model of accumulation of evidence that accounts for the ‘1/f’ spectral properties of brain signals, and the observed variability in the power spectral properties of brain signals. Furthermore, our model outperforms the standard drift-diffusion model at approximating the empirical waiting time distribution.

Journal ArticleDOI
TL;DR: This paper focused on the 2014 Hokkolorob (Let there be noise) movement at Jadavpur University, Kolkata, in the state of West Bengal, a student agitation that ultimately led to the forced resig...
Abstract: This article focuses on the 2014 Hokkolorob (‘Let there be noise’) movement at Jadavpur University, Kolkata, in the state of West Bengal, a student agitation that ultimately led to the forced resig...

Patent
25 Oct 2019
TL;DR: In this paper, sound source positioning and directional enhancement are carried out after audio data is obtained through a microphone array; noise data is acquired through a single-direction microphone; and then the audio data subjected to the directional enhancement is filtered through the noise data to obtain denoised audio data.
Abstract: According to an embodiment of the invention, sound source positioning and directional enhancement are carried out after audio data is obtained through a microphone array; noise data is acquired through a single-direction microphone; and then the audio data subjected to the directional enhancement is filtered through the noise data to obtain denoised audio data. The noise in a non-sound source direction can be well inhibited; the ambient noise in a sound source can be effectively filtered; the audio data can be filtered better; and the clearer audio data is output for a rear end, so that the accuracy of rear-end voice recognition is improved.

Proceedings ArticleDOI
15 Sep 2019
TL;DR: The results show that, in quiet listening conditions, pupil dilation does not reflect listening effort but rather attention and engagement, and in noisy conditions, increased pupils dilation indicates that listening effort increases as signal-to-noise ratio decreases, under all conditions tested.
Abstract: With increased use of text-to-speech (TTS) systems in realworld applications, evaluating how such systems influence the human cognitive processing system becomes important. Particularly in situations where cognitive load is high, there may be negative implications such as fatigue. For example, noisy situations generally require the listener to exert increased mental effort. A better understanding of this could eventually suggest new ways of generating synthetic speech that demands low cognitive load. In our previous study, pupil dilation was used as an index of cognitive effort. Pupil dilation was shown to be sensitive to the quality of synthetic speech, but there were some uncertainties regarding exactly what was being measured. The current study resolves some of those uncertainties. Additionally, we investigate how the pupil dilates when listening to synthetic speech in the presence of speech-shaped noise. Our results show that, in quiet listening conditions, pupil dilation does not reflect listening effort but rather attention and engagement. In noisy conditions, increased pupil dilation indicates that listening effort increases as signal-to-noise ratio decreases, under all conditions tested.

Journal ArticleDOI
TL;DR: In this article, an LSTM network was used to estimate the noise power spectral density (PSD) of single-channel audio signals represented in the short-time Fourier transform (STFT) domain.
Abstract: We propose a method using a long short-term memory (LSTM) network to estimate the noise power spectral density (PSD) of single-channel audio signals represented in the short-time Fourier transform (STFT) domain. An LSTM network common to all frequency bands is trained, which processes each frequency band individually by mapping the noisy STFT magnitude sequence to its corresponding noise PSD sequence. Unlike deep-learning-based speech-enhancement methods, which learn the full-band spectral structure of speech segments, the proposed method exploits the sub-band STFT magnitude evolution of noise with long time dependence, in the spirit of the unsupervised noise estimators described in the literature. Speaker- and speech-independent experiments with different types of noise show that the proposed method outperforms the unsupervised estimators, and it generalizes well to noise types that are not present in the training set.

Patent
Park Jeheon1, Kim Ki-Won1
31 Oct 2019
TL;DR: In this paper, the authors present an electronic device and method for cancelling (or suppressing) a noise of an audio signal of an unmanned aerial vehicle, the electronic device comprising: a movement module comprising a motor; an audio module comprising the first noise suppression module; a memory module for storing control data corresponding to driving data (round per minute RPM) of the motor; and a processor functionally coupled to the audio module, the movement module and the memory module, wherein the processor sets control data according to the driving data of the UAV, and applies the set control data to the
Abstract: Various embodiments of the present invention relate to an electronic device and method for cancelling (or suppressing) a noise of an audio signal of an unmanned aerial vehicle, the electronic device comprising: a movement module comprising a motor; an audio module comprising a first noise suppression module; a memory module for storing control data corresponding to driving data (round per minute RPM) of the motor; and a processor functionally coupled to the audio module, the movement module and the memory module, wherein the processor sets control data according to the driving data of the motor, and applies the set control data to the audio module so that the first noise suppression module suppresses or cancels a noise in an audio signal inputted to the audio module based on the control data. Other embodiments are also applicable.

Journal ArticleDOI
TL;DR: The spectrally interleaved and tonal maskers produce a much larger difference in performance between normal-hearing listeners and CI users than do traditional speech-in-noise measures, and thus provide a more sensitive test of speech perception abilities for current and future implantable devices.
Abstract: Poor spectral resolution contributes to the difficulties experienced by cochlear implant (CI) users when listening to speech in noise. However, correlations between measures of spectral resolution and speech perception in noise have not always been found to be robust. It may be that the relationship between spectral resolution and speech perception in noise becomes clearer in conditions where the speech and noise are not spectrally matched, so that improved spectral resolution can assist in separating the speech from the masker. To test this prediction, speech intelligibility was measured with noise or tone maskers that were presented either in the same spectral channels as the speech or in interleaved spectral channels. Spectral resolution was estimated via a spectral ripple discrimination task. Results from vocoder simulations in normal-hearing listeners showed increasing differences in speech intelligibility between spectrally overlapped and interleaved maskers as well as improved spectral ripple discrimination with increasing spectral resolution. However, no clear differences were observed in CI users between performance with spectrally interleaved and overlapped maskers, or between tone and noise maskers. The results suggest that spectral resolution in current CIs is too poor to take advantage of the spectral separation produced by spectrally interleaved speech and maskers. Overall, the spectrally interleaved and tonal maskers produce a much larger difference in performance between normal-hearing listeners and CI users than do traditional speech-in-noise measures, and thus provide a more sensitive test of speech perception abilities for current and future implantable devices.

Journal ArticleDOI
01 Jan 2019
TL;DR: A new engineering solution is proposed, which enables reducing the effect of noises on the input of the track receiver in the intervals between signal current pulses, and allows to increase a signal-to-noise ratio on the track Receiver input from 8% to 30%, depending on the interference parameters and the level of the useful signal.
Abstract: In connection with electromagnetic interference influence on the track circuits, the purpose of the research is finding the means to increase the noise immunity of an audio frequency track circuit. The authors propose a new engineering solution, which enables reducing the effect of noises on the input of the track receiver in the intervals between signal current pulses. The proposed noise-immune audio frequency track circuit is based on inserting a delay line, an adjustable single-pulse generator and a controlled electronic switch into the existing audio frequency track circuit equipment. To analyze its efficiency, the operation of the audio frequency track circuit was simulated under conditions of traction current disturbances, impulse and fluctuation interferences with the known parameters. The results show that proposed device for railway transport allows to increase a signal-to-noise ratio on the track receiver input from 8% to 30%, depending on the interference parameters and the level of the useful signal.

Journal ArticleDOI
TL;DR: Results showed that a high-density advantage for children under quiet listening condition was significantly reduced as noise increased, which implies an adverse impact of noise on long-term outcomes of word learning.
Abstract: Many studies have addressed the effect of neighborhood density (phonological similarity among words) on word learning in quiet listening conditions. We explored how noise influences the effect of neighborhood density on children's word learning. One-hundred-and-two preschoolers learned nonwords varied in neighborhood density in one of four listening conditions: quiet, +15 dB signal-to-noise ratio (SNR), +6 dB SNR, and 0 dB SNR. Results showed that a high-density advantage for children under quiet listening condition was significantly reduced as noise increased. This finding implies an adverse impact of noise on long-term outcomes of word learning.

Posted Content
Gaurav Mittal1, Baoyuan Wang1
TL;DR: This work proposes an explicit audio representation learning framework that disentangles audio sequences into various factors such as phonetic content, emotional tone, background noise and others and demonstrates that when conditioned on disentangled content representation, the generated mouth movement by the model is significantly more accurate than previous approaches in the presence of noise and emotional variations.
Abstract: All previous methods for audio-driven talking head generation assume the input audio to be clean with a neutral tone. As we show empirically, one can easily break these systems by simply adding certain background noise to the utterance or changing its emotional tone (to such as sad). To make talking head generation robust to such variations, we propose an explicit audio representation learning framework that disentangles audio sequences into various factors such as phonetic content, emotional tone, background noise and others. We conduct experiments to validate that conditioned on disentangled content representation, the generated mouth movement by our model is significantly more accurate than previous approaches (without disentangled learning) in the presence of noise and emotional variations. We further demonstrate that our framework is compatible with current state-of-the-art approaches by replacing their original audio learning component with ours. To our best knowledge, this is the first work which improves the performance of talking head generation from disentangled audio representation perspective, which is important for many real-world applications.

Journal ArticleDOI
TL;DR: It was found that subjects fed more in the silence following playback than during the playback itself for all types of stimuli, suggesting that chickadees may shift their feeding behaviour to avoid feeding during periods of noise.