scispace - formally typeset
Search or ask a question
Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.


Papers
More filters
Proceedings ArticleDOI
12 May 2019
TL;DR: SubSpectralNet as discussed by the authors uses band-wise crops of the input time-frequency representations and trains a convolutional neural network (CNN) on the same to capture discriminative features by incorporating frequency band-level differences.
Abstract: Acoustic Scene Classification (ASC) is one of the core research problems in the field of Computational Sound Scene Analysis. In this work, we present SubSpectralNet, a novel model which captures discriminative features by incorporating frequency band-level differences to model soundscapes. Using mel-spectrograms, we propose the idea of using band-wise crops of the input time-frequency representations and train a convolutional neural network (CNN) on the same. We also propose a modification in the training method for more efficient learning of the CNN models. We first give a motivation for using sub-spectrograms by giving intuitive and statistical analyses and finally we develop a sub-spectrogram based CNN architecture for ASC. The system is evaluated on the public ASC development dataset provided for the "Detection and Classification of Acoustic Scenes and Events" (DCASE) 2018 Challenge. Our best model achieves an improvement of +14% in terms of classification accuracy with respect to the DCASE 2018 baseline system. Code and figures are available at https://github.com/ssrp/SubSpectralNet

57 citations

Journal ArticleDOI
TL;DR: In this paper, a real-time automatic detection system for regional phase arrivals on the NORSAR array and demonstrate how stable and accurate slowness and azimuth estimates can be obtained for quite marginal signals.
Abstract: SUMMARY Seismic arrays are employed in the global monitoring of earthquakes and explosions because of their superior ability to detect and estimate the direction of incident seismic arrivals. Traditional beamforming and f–k analysis require waveform semblance over the full array aperture and cannot be applied in many situations where signals are incoherent between sensors. The NORSAR and MJAR arrays are two primary IMS stations where this is the case for high-frequency regional phases. Large intersite distances and significant geological heterogeneity at these arrays result in waveform dissimilarity which precludes coherent array processing in the frequency bands with optimal SNR. Multitaper methods provide low variance spectral estimates over short time-windows and seismic arrivals can be detected on single channels using a non-linear spectrogram transformation which attains local maxima at times and frequencies characterized by an energy increase. This detection procedure requires very little a priori knowledge of the spectral content of the signal. The transformed spectrograms can be beamformed over large-aperture arrays or networks according to theoretical time-delays resulting in an incoherent detection system which does not require waveform semblance at any frequencies. We outline a real-time automatic detection system for regional phase arrivals on the NORSAR array and demonstrate how stable and accurate slowness and azimuth estimates can be obtained for quite marginal signals. In the case of partially coherent arrays, the procedure described may provide stable, if low resolution, estimates which can subsequently be refined using coherent processing over subsets of sensors. In particular, we illustrate how the spectrogram beamforming method facilitates a stable and accurate slowness estimate for the incoherent high-frequency Pn arrival at the MJAR array in Japan from the 2006 October 9 underground nuclear test in North Korea.

57 citations

Journal ArticleDOI
TL;DR: In this paper, the authors used a nonlinear decomposition technique called the empirical mode decomposition EMD method with the Hilbert transform to obtain more reliable low frequency electromagneticVLF-EM data.
Abstract: Geologic noise and background electromagnetic EM waves often degrade the quality of very low frequency electromagneticVLF-EMdata.Toretrievesignalswithsignificant geologic information, we used a new nonlinear decomposition technique called the empirical mode decomposition EMD method with the Hilbert transform. We conducted a 2Dresistivitymodelstudythatincludedinversionofthesyntheticdatatotesttheaccuracyandcapabilitiesofthismethod. Next, we applied this method to real data obtained from a fieldexperimentandageologicexample.ThefilteringprocedureforrealdatastartswithapplyingtheEMDmethodtodecompose the VLF data into a series of intrinsic mode functions that admit a well-behaved Hilbert transform. With the Hilbert transform, the intrinsic mode functions yielded a spectrogram that presents an energy-wavenumber-distance distribution of the VLF data. We then examined the decomposeddataandtheirspectrogramtodeterminethenoisecomponents, which we eliminated to obtain more reliable VLF data. The EMD-filtered data and their associated spectrograms indicated the successful application of this method. Because VLF data are recorded as a complex function of the real variable distance, the in-phase and quadrature parts are complementarycomponentsofeachotherandcouldbeaHilbert transform pair if the data are analytical and noise free. Therefore,bycomparingtheoriginaldatasetwiththeoneobtained from the Hilbert transform, we could evaluate data quality and could even replace the original with its Hilbert transform counterpart with acceptable accuracy. By application of both this technique and conventional methods to real data in this study, we have shown the superiority of this new method and have obtained a more reliable earth model by invertingtheEMD-filtereddata.

56 citations

Journal ArticleDOI
TL;DR: A convergence-guaranteed algorithm for supervised determined source separation that consists of iteratively estimating the power spectrograms of the underlying sources, as well as the separation matrices is developed.
Abstract: This letter proposes a multichannel source separation technique, the multichannel variational autoencoder (MVAE) method, which uses a conditional VAE (CVAE) to model and estimate the power spectrograms of the sources in a mixture. By training the CVAE using the spectrograms of training examples with source-class labels, we can use the trained decoder distribution as a universal generative model capable of generating spectrograms conditioned on a specified class index. By treating the latent space variables and the class index as the unknown parameters of this generative model, we can develop a convergence-guaranteed algorithm for supervised determined source separation that consists of iteratively estimating the power spectrograms of the underlying sources, as well as the separation matrices. In experimental evaluations, our MVAE produced better separation performance than a baseline method.

56 citations

Journal ArticleDOI
TL;DR: The experimental results show that the proposed singing voice enhancement technique considerably improved the performance of a simple pitch estimation technique, and this results prove the effectiveness of the proposed method.
Abstract: We propose a novel singing voice enhancement technique for monaural music audio signals, which is a quite challenging problem. Many singing voice enhancement techniques have been proposed recently. However, our approach is based on a quite different idea from these existing methods. We focused on the fluctuation of a singing voice and considered to detect it by exploiting two differently resolved spectrograms, one has rich temporal resolution and poor frequency resolution, while the other has rich frequency resolution and poor temporal resolution. On such two spectrograms, the shapes of fluctuating components are quite different. Based on this idea, we propose a singing voice enhancement technique that we call two-stage harmonic/percussive sound separation (HPSS). In this paper, we describe the details of two-stage HPSS and evaluate the performance of the method. The experimental results show that SDR, a commonly-used criterion on the task, was improved by around 4 dB, which is a considerably higher level than existing methods. In addition, we also evaluated the performance of the method as a preprocessing for melody estimation in music. The experimental results show that our singing voice enhancement technique considerably improved the performance of a simple pitch estimation technique. These results prove the effectiveness of the proposed method.

56 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
79% related
Convolutional neural network
74.7K papers, 2M citations
78% related
Feature extraction
111.8K papers, 2.1M citations
77% related
Wavelet
78K papers, 1.3M citations
76% related
Support vector machine
73.6K papers, 1.7M citations
75% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023627
20221,396
2021488
2020595
2019593