Topic
Spectrogram
About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: The authors present a method to combine the two spectrograms by evaluating the geometric mean of the corresponding short-time Fourier transform magnitudes, and the combined spectrogram preserves the desirable visual features of the originals.
Abstract: Existing speech spectrograms-the wideband spectrogram and the narrowband spectrogram-are either deficient in time or frequency resolution. The authors present a method to combine the two spectrograms by evaluating the geometric mean of the corresponding short-time Fourier transform magnitudes. The combined spectrogram preserves the desirable visual features of the originals. >
29 citations
•
12 Jul 2020TL;DR: This paper proposed ParaNet, a VAE-based approach to train the inverse autoregressive flow (IAF) based parallel vocoder from scratch, which avoids the need for distillation from a separately trained WaveNet as previous work.
Abstract: In this work, we propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram. It is fully convolutional and brings 46.7 times speed-up over the lightweight Deep Voice 3 at synthesis, while obtaining reasonably good speech quality. ParaNet also produces stable alignment between text and speech on the challenging test sentences by iteratively improving the attention in a layer-by-layer manner. Furthermore, we build the parallel text-to-speech system and test various parallel neural vocoders, which can synthesize speech from text through a single feed-forward pass. We also explore a novel VAE-based approach to train the inverse autoregressive flow (IAF) based parallel vocoder from scratch, which avoids the need for distillation from a separately trained WaveNet as previous work.
29 citations
••
01 May 2017TL;DR: This paper presents a framework that spots the presence of acoustic events, such as horns and sirens, using a two-stage approach, and shows an improvement of up to 31% in the classification rate.
Abstract: Urban environments are characterised by the presence of distinctive audio signals which alert the drivers to events that require prompt action. The detection and interpretation of these signals would be highly beneficial for smart vehicle systems, as it would provide them with complementary information to navigate safely in the environment. In this paper, we present a framework that spots the presence of acoustic events, such as horns and sirens, using a two-stage approach. We first model the urban soundscape and use anomaly detection to identify the presence of an anomalous sound, and later determine the nature of this sound. As the audio samples are affected by copious non-stationary and unstructured noise, which can degrade classification performance, we propose a noise-removal technique to obtain a clean representation of the data we can use for classification and waveform reconstruction. The method is based on the idea of analysing the spectrograms of the incoming signals as images and applying spectrogram segmentation to isolate and extract the alerting signals from the background noise. We evaluate our framework on four hours of urban sounds collected driving around urban Oxford on different kinds of road and in different traffic conditions. When compared to traditional feature representations, such as Mel-frequency cepstrum coefficients, our framework shows an improvement of up to 31% in the classification rate.
29 citations
•
14 Apr 2008TL;DR: In this article, an audio signal produced by playing a plurality of musical instruments is separated into sound sources according to respective instrument sounds, and each time a separation process is performed, the updated model parameter estimation/storage section 114 estimates parameters respectively contained in updated model parameters.
Abstract: An audio signal produced by playing a plurality of musical instruments is separated into sound sources according to respective instrument sounds. Each time a separation process is performed, the updated model parameter estimation/storage section 114 estimates parameters respectively contained in updated model parameters such that updated power spectrograms gradually change from a state close to initial power spectrograms to a state close to a plurality of power spectrograms most recently stored in a power spectrogram separation/storage section. Respective sections including the power spectrogram separation/storage section 112 and an updated distribution function computation/storage section 118 repeatedly perform process operations until the updated power spectrograms change from the state close to the initial power spectrograms to the state close to the plurality of power spectrograms most recently stored in the power spectrogram separation/storage section 112. The final updated power spectrograms are close to the power spectrograms of single tones of one musical instrument contained in the input audio signal formed to contain harmonic and inharmonic models.
29 citations
••
TL;DR: The usefulness of the generalized instantaneous parameters is demonstrated in their application to optimal selection of windows for spectrograms through window matching in the time-frequency plane.
Abstract: The concept of instantaneous parameters, which has previously been associated exclusively with 1-D measures like the instantaneous frequency and the group delay, are extended to the 2-D time-frequency plane. Such generalized instantaneous parameters are associated with the short-time Fourier transform. They may also be interpreted as local moments of certain time-frequency distributions. It is shown that these measures enable local signal behavior to be characterized in the time-frequency plane for nonstationary deterministic signals. The usefulness of the generalized instantaneous parameters is demonstrated in their application to optimal selection of windows for spectrograms. This is achieved through window matching in the time-frequency plane. An algorithm is provided that illustrates the performance of this window matching. Results based on simulated and real data are presented.
29 citations