Sound Texture Perception via Statistics of the Auditory Periphery: Evidence from Sound Synthesis
TLDR
The results suggest that sound texture perception is mediated by relatively simple statistics of early auditory representations, presumably computed by downstream neural populations, and the synthesis methodology offers a powerful tool for their further investigation.About:
This article is published in Neuron.The article was published on 2011-09-08 and is currently open access. It has received 342 citations till now. The article focuses on the topics: Auditory scene analysis & Auditory perception.read more
Citations
More filters
Journal ArticleDOI
Deep Scattering Spectrum
Joakim Andén,Stéphane Mallat +1 more
TL;DR: A scattering transform defines a locally translation invariant representation which is stable to time-warping deformation and extends MFCC representations by computing modulation spectrum coefficients of multiple orders, through cascades of wavelet convolutions and modulus operators.
Book ChapterDOI
Ambient Sound Provides Supervision for Visual Learning
Andrew Owens,Jiajun Wu,Josh H. McDermott,William T. Freeman,William T. Freeman,Antonio Torralba +5 more
TL;DR: This work trains a convolutional neural network to predict a statistical summary of the sound associated with a video frame, and shows that this representation is comparable to that of other state-of-the-art unsupervised learning methods.
Journal ArticleDOI
A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy
Alexander J. E. Kell,Daniel L. K. Yamins,Erica N. Shook,Sam V. Norman-Haignere,Josh H. McDermott,Josh H. McDermott +5 more
TL;DR: A core goal of auditory neuroscience is to build quantitative models that predict cortical responses to natural sounds, and hierarchical neural networks for speech and music recognition were optimized to solve ecologically relevant tasks.
Journal ArticleDOI
A functional and perceptual signature of the second visual area in primates
Jeremy Freeman,Jeremy Freeman,Corey M. Ziemba,David J. Heeger,David J. Heeger,Eero P. Simoncelli,J. Anthony Movshon,J. Anthony Movshon +7 more
TL;DR: A synthetic stimuli replicating the higher-order statistical dependencies found in natural texture images was constructed and used to stimulate macaque V1 and V2 neurons, revealing a particular functional role for V2 in the representation of natural image structure.
Proceedings ArticleDOI
Visually Indicated Sounds
Andrew Owens,Phillip Isola,Josh H. McDermott,Antonio Torralba,Edward H. Adelson,William T. Freeman +5 more
TL;DR: This paper presents an algorithm that synthesizes sound from silent videos of people hitting and scratching objects with a drumstick, using a recurrent neural network to predict sound features from videos and then producing a waveform from these features with an example-based synthesis procedure.
References
More filters
Journal ArticleDOI
A 2dvEv- bit distributed algorithm for the directed Euler trail problem
Wen-Huei Chen,Chuan Yi Tang +1 more
TL;DR: The algorithm can be used as a building block for solving other distributed graph problems, and can be slightly modified to run on a strongly-connected diagraph for generating the existent Euler trail or to report that no Euler trails exist.
Journal ArticleDOI
Emergence of simple-cell receptive field properties by learning a sparse code for natural images
TL;DR: It is shown that a learning algorithm that attempts to find sparse linear codes for natural scenes will develop a complete family of localized, oriented, bandpass receptive fields, similar to those found in the primary visual cortex.
Journal ArticleDOI
Spatiotemporal energy models for the perception of motion
TL;DR: In this article, the first stage consists of linear filters that are oriented in space-time and tuned in spatial frequency, and the outputs of quadrature pairs of such filters are squared and summed to give a measure of motion energy.
Journal ArticleDOI
Relations between the statistics of natural images and the response properties of cortical cells.
TL;DR: The results obtained with six natural images suggest that the orientation and the spatial-frequency tuning of mammalian simple cells are well suited for coding the information in such images if the goal of the code is to convert higher-order redundancy into first- order redundancy.
Journal ArticleDOI
Speech recognition with primarily temporal cues.
TL;DR: Nearly perfect speech recognition was observed under conditions of greatly reduced spectral information; the presentation of a dynamic temporal pattern in only a few broad spectral regions is sufficient for the recognition of speech.