scispace - formally typeset
Search or ask a question
Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The authors present a method to combine the two spectrograms by evaluating the geometric mean of the corresponding short-time Fourier transform magnitudes, and the combined spectrogram preserves the desirable visual features of the originals.
Abstract: Existing speech spectrograms-the wideband spectrogram and the narrowband spectrogram-are either deficient in time or frequency resolution. The authors present a method to combine the two spectrograms by evaluating the geometric mean of the corresponding short-time Fourier transform magnitudes. The combined spectrogram preserves the desirable visual features of the originals. >

29 citations

Proceedings Article
Kainan Peng1, Wei Ping1, Zhao Song1, Kexin Zhao1
12 Jul 2020
TL;DR: This paper proposed ParaNet, a VAE-based approach to train the inverse autoregressive flow (IAF) based parallel vocoder from scratch, which avoids the need for distillation from a separately trained WaveNet as previous work.
Abstract: In this work, we propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram. It is fully convolutional and brings 46.7 times speed-up over the lightweight Deep Voice 3 at synthesis, while obtaining reasonably good speech quality. ParaNet also produces stable alignment between text and speech on the challenging test sentences by iteratively improving the attention in a layer-by-layer manner. Furthermore, we build the parallel text-to-speech system and test various parallel neural vocoders, which can synthesize speech from text through a single feed-forward pass. We also explore a novel VAE-based approach to train the inverse autoregressive flow (IAF) based parallel vocoder from scratch, which avoids the need for distillation from a separately trained WaveNet as previous work.

29 citations

Proceedings ArticleDOI
01 May 2017
TL;DR: This paper presents a framework that spots the presence of acoustic events, such as horns and sirens, using a two-stage approach, and shows an improvement of up to 31% in the classification rate.
Abstract: Urban environments are characterised by the presence of distinctive audio signals which alert the drivers to events that require prompt action. The detection and interpretation of these signals would be highly beneficial for smart vehicle systems, as it would provide them with complementary information to navigate safely in the environment. In this paper, we present a framework that spots the presence of acoustic events, such as horns and sirens, using a two-stage approach. We first model the urban soundscape and use anomaly detection to identify the presence of an anomalous sound, and later determine the nature of this sound. As the audio samples are affected by copious non-stationary and unstructured noise, which can degrade classification performance, we propose a noise-removal technique to obtain a clean representation of the data we can use for classification and waveform reconstruction. The method is based on the idea of analysing the spectrograms of the incoming signals as images and applying spectrogram segmentation to isolate and extract the alerting signals from the background noise. We evaluate our framework on four hours of urban sounds collected driving around urban Oxford on different kinds of road and in different traffic conditions. When compared to traditional feature representations, such as Mel-frequency cepstrum coefficients, our framework shows an improvement of up to 31% in the classification rate.

29 citations

Patent
14 Apr 2008
TL;DR: In this article, an audio signal produced by playing a plurality of musical instruments is separated into sound sources according to respective instrument sounds, and each time a separation process is performed, the updated model parameter estimation/storage section 114 estimates parameters respectively contained in updated model parameters.
Abstract: An audio signal produced by playing a plurality of musical instruments is separated into sound sources according to respective instrument sounds. Each time a separation process is performed, the updated model parameter estimation/storage section 114 estimates parameters respectively contained in updated model parameters such that updated power spectrograms gradually change from a state close to initial power spectrograms to a state close to a plurality of power spectrograms most recently stored in a power spectrogram separation/storage section. Respective sections including the power spectrogram separation/storage section 112 and an updated distribution function computation/storage section 118 repeatedly perform process operations until the updated power spectrograms change from the state close to the initial power spectrograms to the state close to the plurality of power spectrograms most recently stored in the power spectrogram separation/storage section 112. The final updated power spectrograms are close to the power spectrograms of single tones of one musical instrument contained in the input audio signal formed to contain harmonic and inharmonic models.

29 citations

Journal ArticleDOI
G. Jones1, Boualem Boashash
TL;DR: The usefulness of the generalized instantaneous parameters is demonstrated in their application to optimal selection of windows for spectrograms through window matching in the time-frequency plane.
Abstract: The concept of instantaneous parameters, which has previously been associated exclusively with 1-D measures like the instantaneous frequency and the group delay, are extended to the 2-D time-frequency plane. Such generalized instantaneous parameters are associated with the short-time Fourier transform. They may also be interpreted as local moments of certain time-frequency distributions. It is shown that these measures enable local signal behavior to be characterized in the time-frequency plane for nonstationary deterministic signals. The usefulness of the generalized instantaneous parameters is demonstrated in their application to optimal selection of windows for spectrograms. This is achieved through window matching in the time-frequency plane. An algorithm is provided that illustrates the performance of this window matching. Results based on simulated and real data are presented.

29 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
79% related
Convolutional neural network
74.7K papers, 2M citations
78% related
Feature extraction
111.8K papers, 2.1M citations
77% related
Wavelet
78K papers, 1.3M citations
76% related
Support vector machine
73.6K papers, 1.7M citations
75% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023627
20221,396
2021488
2020595
2019593