scispace - formally typeset
Search or ask a question
Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.


Papers
More filters
Proceedings ArticleDOI
22 May 2011
TL;DR: A new technique for monaural source separation in musical mixtures, which uses the knowledge of the musical score to initialize an algorithm which computes a parametric decomposition of the spectrogram based on non-negative matrix factorization (NMF).
Abstract: In this paper we present a new technique for monaural source separation in musical mixtures, which uses the knowledge of the musical score. This information is used to initialize an algorithm which computes a parametric decomposition of the spectrogram based on non-negative matrix factorization (NMF). This algorithm provides time-frequency masks which are used to separate the sources with Wiener filtering.

100 citations

Posted Content
20 Sep 2018
TL;DR: TasNet as discussed by the authors uses a convolutional encoder to create a representation of the signal that is optimized for extracting individual speakers, which is achieved by applying a weighting function (mask) to the encoder output.
Abstract: Robust speech processing in multitalker acoustic environments requires automatic speech separation. While single-channel, speaker-independent speech separation methods have recently seen great progress, the accuracy, latency, and computational cost of speech separation remain insufficient. The majority of the previous methods have formulated the separation problem through the time-frequency representation of the mixed signal, which has several drawbacks, including the decoupling of the phase and magnitude of the signal, the suboptimality of spectrogram representations for speech separation, and the long latency in calculating the spectrogram. To address these shortcomings, we propose the time-domain audio separation network (TasNet), which is a deep learning autoencoder framework for time-domain speech separation. TasNet uses a convolutional encoder to create a representation of the signal that is optimized for extracting individual speakers. Speaker extraction is achieved by applying a weighting function (mask) to the encoder output. The modified encoder representation is then inverted to the sound waveform using a linear decoder. The masks are found using a temporal convolutional network consisting of dilated convolutions, which allow the network to model the long-term dependencies of the speech signal. This end-to-end speech separation algorithm significantly outperforms previous time-frequency methods in terms of separating speakers in mixed audio, even when compared to the separation accuracy achieved with the ideal time-frequency mask of the speakers. In addition, TasNet has a smaller model size and a shorter minimum latency, making it a suitable solution for both offline and real-time speech separation applications. This study therefore represents a major step toward actualizing speech separation for real-world speech processing technologies.

100 citations

Posted Content
TL;DR: In this article, the authors present a review of various representations and issues that arise when using neural networks for style transfer in audio applications, focusing particularly on spectrograms for generating audio using NNs.
Abstract: One of the decisions that arise when designing a neural network for any application is how the data should be represented in order to be presented to, and possibly generated by, a neural network. For audio, the choice is less obvious than it seems to be for visual images, and a variety of representations have been used for different applications including the raw digitized sample stream, hand-crafted features, machine discovered features, MFCCs and variants that include deltas, and a variety of spectral representations. This paper reviews some of these representations and issues that arise, focusing particularly on spectrograms for generating audio using neural networks for style transfer.

99 citations

Book
06 Dec 2012
TL;DR: Acoustic Signals of Animals: Recording, Field Measurements, Analysis and Description finds application of Filters in Bioacoustics, Digital Signal Analysis, Editing, and Synthesis, and Properties of Various Analog Filters and Antialiasing and Antiimaging Filters.
Abstract: Chapter 1 Acoustic Signals of Animals: Recording, Field Measurements, Analysis and Description H. C. Gerhardt 1 Introduction 2 Field Recordings and Measurements 2.1 Equipment 2.2 On-Site Measurements 2.3 Signal Amplitude, Directionality, and Background Noise Levels 2.4 Patterns of Sound Propagation in Natural Habitats 3 Laboratory Analysis of Animal Sounds 3.1 Terminology 3.2 Temporal and Spectral Analysis: Some General Principles 4 Examples of Descriptions and Analyses 4.1 Temporal Properties of Pulsatile Calls 4.2 Amplitude-Time Envelopes 4.3 Relationships between Fine-Scale Temporal and Spectral Properties 4.4 Spectrally Complex Calls 5 Summary References.- Chapter 2 Digital Signal Acquisition and Representation M. Clements 1 Introduction 2 Digital Signal Processing 2.1 Major Applications of DSP 2.2 Definition of Digital Systems 2.3 Difference Equations 3 Digital Filter Frequency Response 3.1 Unit-Sample Response Characterization 3.2 Frequency-Domain Interpretation of Systems 3.3 Frequency-Domain Interpretation of Signals 4 Conversion Between Analog and Digital Data Forms 4.1 The Sampling Theorem 4.2 Signal Recovery by Filtering 4.3 Fourier Transform Relations 4.4 Effects of Sampling Rates 4.5 Reconstruction 5 Fundamental Digital Processing Techniques 5.1 Power Spectra 5.2 Time and Frequency Resolution 5.3 Windows 5.4 Spectral Smoothing 5.5 The Discrete Fourier Transform 5.6 Correlation 5.7 Autocorrelation 5.8 Cross-correlation 5.9 Spectrograms 6 An Intoduction to Some Advanced Topics 6.1 Digital Filtering 6.2 Linear Prediction 6.3 Homomorphic Analysis 7 Summary.- Chapter 3 Digital Signal Analysis, Editing, and Synthesis K. Beeman 1 Introduction 2 Temporal and Spectral Measurements 3 Time-Varying Amplitude Analysis 3.1 Amplitude Envelopes 3.2 Gate Functions 4 Spectral Analysis 4.1 Power Spectrum Features 4.2 Measuring Similarity Among Power Spectra 4.3 Other Spectral Analysis Techniques 5 Spectrographic Analysis 5.1 Spectrogram Generation 5.2 Spectrogram Display 5.3 Spectrogram Parameter Measurements 6 Classification of Naturally Occurring Animal Sounds 6.1 Properties of Ideal Signals 6.1.1 Periodicity 6.1.2 Amplitude Modulation 6.1.3 Frequency Modulation 6.1.4 Biologically Relevant Sound Types 7 Time-varying Frequency Analysis 7.1 Deriving Spectral Contours 7.2 Sound-similarity Comparison 8 Digital Sound Synthesis 8.1 Editing 8.2 Arithmetic Manipulation and Generation of Sound 8.3 Synthesis Models 8.3.1 Tonal Model 8.4 Sources of and A Functions 8.4.1 Mathematically Based Functions 8.4.2 Functions Derived from Natural Sounds 9 Sound Manipulation and Generation Techniques 9.1 Duration Scaling 9.2 Amplitude-Envelope Manipulations 9.3 Spectral Manipulations 9.3.1 Frequency Shifting and Scaling 9.3.2 Frequency Modulation 9.4 Synthesis of Biological Sound Types 9.4.1 Tonal and Polytonal Signals 9.4.2 Pulse-Repetition Signals 9.4.3 Harmonic Signals 9.4.4 Noisy Signals 9.5 Miscellaneous Synthesis Topics 9.5.1 Template Sounds 9.5.2 Noise Removal 10 Summary References.- Chapter 4 Application of Filters in Bioacoustics P. K. Stoddard 1 Introduction 2 General Uses of Filters and Some Cautions 3 Anatomy and Performance of a Filter 4 Properties of Various Analog Filters 5 Antialiasing and Antiimaging Filters 5.1 A/D Conversion Requires an Analog Lowpass Filter 5.2 Choosing an Antialiasing Filter 5.3 D/A Conversion also Requires an Analog Lowpass Filter 5.4 Analog Filters: Passive Versus Active Components 6 Analog Versus Digital Filters

98 citations

Journal ArticleDOI
TL;DR: To validate conclusions drawn from visual examination of spectrograms, or, more generally, to determine the stimulus correlates of perceived speech, it will often be necessary to make controlled modifications in the spectrogram, and then to evaluate the effects of those modifications on the sound as heard.
Abstract: of contexts, an investigator can arrive at a description of the acoustic features common to all of the samples, and in this way make progress toward defining the so-called invariants of speech, that is, the essential information-bearing sound elements on which the listener's identifications critically depend. The investigator can also take account of the variations among spectrograms, and by correlating these with the observed variations in pronunciation, he can begin to sort out the several acoustic features in relation to the several aspects of the perception. There are, however, many questions about the relation between acoustic stimulus and auditory perception which cannot be answered merely by an inspection of spectrograms, no matter how numerous and varied these may be. For any given unit characteristic of the auditory perception, such as the simple identification of a phoneme, the spectrogram will very often exhibit several features which are distinctive to the eye, and the information which can be obtained from the spectrogram is, accordingly, ambiguous. Even when only one feature or pattern is strikingly evident, one cannot be certain about its auditory significance, unless he assumes that those aspects of the spectrogram which appear most prominently on visual examination are, in fact, of greatest importance to the ear. That assumption, as we shall try to point out later in this paper, is itself extremely interesting, but it has not been directly tested, nor, indeed, has it always been made fully explicit. To validate conclusions drawn from visual examination of spectrograms, or, more generally, to determine the stimulus correlates of perceived speech, it will often be necessary to make controlled modifications in the spectrogram, and then to evaluate the effects of those modifications on the sound as heard. For these purposes, we have constructed an instrument, called

98 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
79% related
Convolutional neural network
74.7K papers, 2M citations
78% related
Feature extraction
111.8K papers, 2.1M citations
77% related
Wavelet
78K papers, 1.3M citations
76% related
Support vector machine
73.6K papers, 1.7M citations
75% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023627
20221,396
2021488
2020595
2019593