Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Score informed audio source separation using a parametric model of non-negative spectrogram

[...]

Romain Hennequin¹, Bertrand David¹, Roland Badeau¹•Institutions (1)

Télécom ParisTech¹

22 May 2011

TL;DR: A new technique for monaural source separation in musical mixtures, which uses the knowledge of the musical score to initialize an algorithm which computes a parametric decomposition of the spectrogram based on non-negative matrix factorization (NMF).

...read moreread less

Abstract: In this paper we present a new technique for monaural source separation in musical mixtures, which uses the knowledge of the musical score. This information is used to initialize an algorithm which computes a parametric decomposition of the spectrogram based on non-negative matrix factorization (NMF). This algorithm provides time-frequency masks which are used to separate the sources with Wiener filtering.

...read moreread less

100 citations

Posted Content•

TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation.

[...]

Yi Luo, Nima Mesgarani

20 Sep 2018

TL;DR: TasNet as discussed by the authors uses a convolutional encoder to create a representation of the signal that is optimized for extracting individual speakers, which is achieved by applying a weighting function (mask) to the encoder output.

...read moreread less

Abstract: Robust speech processing in multitalker acoustic environments requires automatic speech separation. While single-channel, speaker-independent speech separation methods have recently seen great progress, the accuracy, latency, and computational cost of speech separation remain insufficient. The majority of the previous methods have formulated the separation problem through the time-frequency representation of the mixed signal, which has several drawbacks, including the decoupling of the phase and magnitude of the signal, the suboptimality of spectrogram representations for speech separation, and the long latency in calculating the spectrogram. To address these shortcomings, we propose the time-domain audio separation network (TasNet), which is a deep learning autoencoder framework for time-domain speech separation. TasNet uses a convolutional encoder to create a representation of the signal that is optimized for extracting individual speakers. Speaker extraction is achieved by applying a weighting function (mask) to the encoder output. The modified encoder representation is then inverted to the sound waveform using a linear decoder. The masks are found using a temporal convolutional network consisting of dilated convolutions, which allow the network to model the long-term dependencies of the speech signal. This end-to-end speech separation algorithm significantly outperforms previous time-frequency methods in terms of separating speakers in mixed audio, even when compared to the separation accuracy achieved with the ideal time-frequency mask of the speakers. In addition, TasNet has a smaller model size and a shorter minimum latency, making it a suitable solution for both offline and real-time speech separation applications. This study therefore represents a major step toward actualizing speech separation for real-world speech processing technologies.

...read moreread less

100 citations

Posted Content•

Audio Spectrogram Representations for Processing with Convolutional Neural Networks

[...]

Lonce Wyse

29 Jun 2017-arXiv: Sound

TL;DR: In this article, the authors present a review of various representations and issues that arise when using neural networks for style transfer in audio applications, focusing particularly on spectrograms for generating audio using NNs.

...read moreread less

Abstract: One of the decisions that arise when designing a neural network for any application is how the data should be represented in order to be presented to, and possibly generated by, a neural network. For audio, the choice is less obvious than it seems to be for visual images, and a variety of representations have been used for different applications including the raw digitized sample stream, hand-crafted features, machine discovered features, MFCCs and variants that include deltas, and a variety of spectral representations. This paper reviews some of these representations and issues that arise, focusing particularly on spectrograms for generating audio using neural networks for style transfer.

...read moreread less

99 citations

Book•

Animal Acoustic Communication: Sound Analysis and Research Methods

[...]

Steven L. Hopp¹, Michael J. Owren², Christopher S. Evans³•Institutions (3)

University of Arizona¹, Cornell University², Macquarie University³

06 Dec 2012

TL;DR: Acoustic Signals of Animals: Recording, Field Measurements, Analysis and Description finds application of Filters in Bioacoustics, Digital Signal Analysis, Editing, and Synthesis, and Properties of Various Analog Filters and Antialiasing and Antiimaging Filters.

...read moreread less

Abstract: Chapter 1 Acoustic Signals of Animals: Recording, Field Measurements, Analysis and Description H. C. Gerhardt 1 Introduction 2 Field Recordings and Measurements 2.1 Equipment 2.2 On-Site Measurements 2.3 Signal Amplitude, Directionality, and Background Noise Levels 2.4 Patterns of Sound Propagation in Natural Habitats 3 Laboratory Analysis of Animal Sounds 3.1 Terminology 3.2 Temporal and Spectral Analysis: Some General Principles 4 Examples of Descriptions and Analyses 4.1 Temporal Properties of Pulsatile Calls 4.2 Amplitude-Time Envelopes 4.3 Relationships between Fine-Scale Temporal and Spectral Properties 4.4 Spectrally Complex Calls 5 Summary References.- Chapter 2 Digital Signal Acquisition and Representation M. Clements 1 Introduction 2 Digital Signal Processing 2.1 Major Applications of DSP 2.2 Definition of Digital Systems 2.3 Difference Equations 3 Digital Filter Frequency Response 3.1 Unit-Sample Response Characterization 3.2 Frequency-Domain Interpretation of Systems 3.3 Frequency-Domain Interpretation of Signals 4 Conversion Between Analog and Digital Data Forms 4.1 The Sampling Theorem 4.2 Signal Recovery by Filtering 4.3 Fourier Transform Relations 4.4 Effects of Sampling Rates 4.5 Reconstruction 5 Fundamental Digital Processing Techniques 5.1 Power Spectra 5.2 Time and Frequency Resolution 5.3 Windows 5.4 Spectral Smoothing 5.5 The Discrete Fourier Transform 5.6 Correlation 5.7 Autocorrelation 5.8 Cross-correlation 5.9 Spectrograms 6 An Intoduction to Some Advanced Topics 6.1 Digital Filtering 6.2 Linear Prediction 6.3 Homomorphic Analysis 7 Summary.- Chapter 3 Digital Signal Analysis, Editing, and Synthesis K. Beeman 1 Introduction 2 Temporal and Spectral Measurements 3 Time-Varying Amplitude Analysis 3.1 Amplitude Envelopes 3.2 Gate Functions 4 Spectral Analysis 4.1 Power Spectrum Features 4.2 Measuring Similarity Among Power Spectra 4.3 Other Spectral Analysis Techniques 5 Spectrographic Analysis 5.1 Spectrogram Generation 5.2 Spectrogram Display 5.3 Spectrogram Parameter Measurements 6 Classification of Naturally Occurring Animal Sounds 6.1 Properties of Ideal Signals 6.1.1 Periodicity 6.1.2 Amplitude Modulation 6.1.3 Frequency Modulation 6.1.4 Biologically Relevant Sound Types 7 Time-varying Frequency Analysis 7.1 Deriving Spectral Contours 7.2 Sound-similarity Comparison 8 Digital Sound Synthesis 8.1 Editing 8.2 Arithmetic Manipulation and Generation of Sound 8.3 Synthesis Models 8.3.1 Tonal Model 8.4 Sources of and A Functions 8.4.1 Mathematically Based Functions 8.4.2 Functions Derived from Natural Sounds 9 Sound Manipulation and Generation Techniques 9.1 Duration Scaling 9.2 Amplitude-Envelope Manipulations 9.3 Spectral Manipulations 9.3.1 Frequency Shifting and Scaling 9.3.2 Frequency Modulation 9.4 Synthesis of Biological Sound Types 9.4.1 Tonal and Polytonal Signals 9.4.2 Pulse-Repetition Signals 9.4.3 Harmonic Signals 9.4.4 Noisy Signals 9.5 Miscellaneous Synthesis Topics 9.5.1 Template Sounds 9.5.2 Noise Removal 10 Summary References.- Chapter 4 Application of Filters in Bioacoustics P. K. Stoddard 1 Introduction 2 General Uses of Filters and Some Cautions 3 Anatomy and Performance of a Filter 4 Properties of Various Analog Filters 5 Antialiasing and Antiimaging Filters 5.1 A/D Conversion Requires an Analog Lowpass Filter 5.2 Choosing an Antialiasing Filter 5.3 D/A Conversion also Requires an Analog Lowpass Filter 5.4 Analog Filters: Passive Versus Active Components 6 Analog Versus Digital Filters

...read moreread less

98 citations

Journal Article•DOI•

The Interconversion of Audible and Visible Patterns as a Basis for Research in the Perception of Speech

[...]

Franklin S. Cooper, Alvin M. Liberman, John M. Borst

01 May 1951-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: To validate conclusions drawn from visual examination of spectrograms, or, more generally, to determine the stimulus correlates of perceived speech, it will often be necessary to make controlled modifications in the spectrogram, and then to evaluate the effects of those modifications on the sound as heard.

...read moreread less

Abstract: of contexts, an investigator can arrive at a description of the acoustic features common to all of the samples, and in this way make progress toward defining the so-called invariants of speech, that is, the essential information-bearing sound elements on which the listener's identifications critically depend. The investigator can also take account of the variations among spectrograms, and by correlating these with the observed variations in pronunciation, he can begin to sort out the several acoustic features in relation to the several aspects of the perception. There are, however, many questions about the relation between acoustic stimulus and auditory perception which cannot be answered merely by an inspection of spectrograms, no matter how numerous and varied these may be. For any given unit characteristic of the auditory perception, such as the simple identification of a phoneme, the spectrogram will very often exhibit several features which are distinctive to the eye, and the information which can be obtained from the spectrogram is, accordingly, ambiguous. Even when only one feature or pattern is strikingly evident, one cannot be certain about its auditory significance, unless he assumes that those aspects of the spectrogram which appear most prominently on visual examination are, in fact, of greatest importance to the ear. That assumption, as we shall try to point out later in this paper, is itself extremely interesting, but it has not been directly tested, nor, indeed, has it always been made fully explicit. To validate conclusions drawn from visual examination of spectrograms, or, more generally, to determine the stimulus correlates of perceived speech, it will often be necessary to make controlled modifications in the spectrogram, and then to evaluate the effects of those modifications on the sound as heard. For these purposes, we have constructed an instrument, called

...read moreread less

98 citations

Collapse

Network Information

Performance

Metrics

7,848

Papers

107,060

Citations

No. of papers in the topic in previous years
Year	Papers
2024	1
2023	627
2022	1,396
2021	488
2020	595
2019	593

Spectrogram

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics