scispace - formally typeset
Open AccessJournal ArticleDOI

Sound Texture Perception via Statistics of the Auditory Periphery: Evidence from Sound Synthesis

TLDR
The results suggest that sound texture perception is mediated by relatively simple statistics of early auditory representations, presumably computed by downstream neural populations, and the synthesis methodology offers a powerful tool for their further investigation.
About
This article is published in Neuron.The article was published on 2011-09-08 and is currently open access. It has received 342 citations till now. The article focuses on the topics: Auditory scene analysis & Auditory perception.

read more

Citations
More filters
Journal ArticleDOI

Deep Scattering Spectrum

TL;DR: A scattering transform defines a locally translation invariant representation which is stable to time-warping deformation and extends MFCC representations by computing modulation spectrum coefficients of multiple orders, through cascades of wavelet convolutions and modulus operators.
Book ChapterDOI

Ambient Sound Provides Supervision for Visual Learning

TL;DR: This work trains a convolutional neural network to predict a statistical summary of the sound associated with a video frame, and shows that this representation is comparable to that of other state-of-the-art unsupervised learning methods.
Journal ArticleDOI

A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy

TL;DR: A core goal of auditory neuroscience is to build quantitative models that predict cortical responses to natural sounds, and hierarchical neural networks for speech and music recognition were optimized to solve ecologically relevant tasks.
Journal ArticleDOI

A functional and perceptual signature of the second visual area in primates

TL;DR: A synthetic stimuli replicating the higher-order statistical dependencies found in natural texture images was constructed and used to stimulate macaque V1 and V2 neurons, revealing a particular functional role for V2 in the representation of natural image structure.
Proceedings ArticleDOI

Visually Indicated Sounds

TL;DR: This paper presents an algorithm that synthesizes sound from silent videos of people hitting and scratching objects with a drumstick, using a recurrent neural network to predict sound features from videos and then producing a waveform from these features with an example-based synthesis procedure.
References
More filters
Journal ArticleDOI

A 2dvEv- bit distributed algorithm for the directed Euler trail problem

TL;DR: The algorithm can be used as a building block for solving other distributed graph problems, and can be slightly modified to run on a strongly-connected diagraph for generating the existent Euler trail or to report that no Euler trails exist.
Journal ArticleDOI

Emergence of simple-cell receptive field properties by learning a sparse code for natural images

TL;DR: It is shown that a learning algorithm that attempts to find sparse linear codes for natural scenes will develop a complete family of localized, oriented, bandpass receptive fields, similar to those found in the primary visual cortex.
Journal ArticleDOI

Spatiotemporal energy models for the perception of motion

TL;DR: In this article, the first stage consists of linear filters that are oriented in space-time and tuned in spatial frequency, and the outputs of quadrature pairs of such filters are squared and summed to give a measure of motion energy.
Journal ArticleDOI

Relations between the statistics of natural images and the response properties of cortical cells.

TL;DR: The results obtained with six natural images suggest that the orientation and the spatial-frequency tuning of mammalian simple cells are well suited for coding the information in such images if the goal of the code is to convert higher-order redundancy into first- order redundancy.
Journal ArticleDOI

Speech recognition with primarily temporal cues.

TL;DR: Nearly perfect speech recognition was observed under conditions of greatly reduced spectral information; the presentation of a dynamic temporal pattern in only a few broad spectral regions is sufficient for the recognition of speech.
Related Papers (5)