scispace - formally typeset
Search or ask a question
Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.


Papers
More filters
Book
31 Jan 1989
TL;DR: In this article, the authors proposed a time frequency energy representation for speech and showed that the representation can be used for signal detection and ridge identification in the stationary case and the quasi-stationary case.
Abstract: 1 Introduction.- 2 The Time-Frequency Energy Representation.- 2.1. The stationary case.- 2.2. The quasi-stationary case.- 2.3. Non-stationarity.- 2.4. Joint time-frequency representations.- 2.5. Design criteria for time-frequency representations.- 2.6. Relations among the design criteria.- 2.7. Satisfying the design criteria.- 2.8. Directional time-frequency transforms.- 2.9. A speech example.- 3 Time-Frequency Filtering.- 3.1. The stationary case.- 3.2. Non-stationary vocal tract.- 3.3. Time-frequency filtering.- 3.4. The stationary case - re-examined.- 3.5. Linearly varying modulation frequency.- 3.6. The quasi-stationary case.- 3.7. Smoothly varying modulation frequency.- 3.8. The vocal tract transfer function.- 3.9. The transmission channel.- 3.10. The excitation.- 4 The Schematic Spectrogram.- 4.1. Rationale.- 4.2. Spectral Peaks.- 4.3. Time-frequency ridges - non-directional kernel.- 4.4. Time-frequency ridges - directional kernel.- 4.5. Signal detection and ridge identification.- 4.6. Continuity and grouping.- 4.7. A perspective.- 5 A Catalog of Examples.- 5.1. Some general examples.- 5.2. Liquids and glides.- 5.3. Nasalized vowels.- 5.4. Consonant-vowel transitions.- 5.5. Female speech.- 5.6. Transmission channel effects.- References.

44 citations

Journal ArticleDOI
TL;DR: Although the proposed transform has been derived heuristically—namely, to be optimal in the perceptual frequency scale in Gabor-sense and to perform a 1 CB speech analysis—it appears that this is a self-invertible, overcomplete, shiftable transform.

44 citations

Journal ArticleDOI
TL;DR: Systematic evaluations and comparisons on the NIST SRE 2010 retransmitted corpus show that both monaural and multi-channel speech enhancement significantly outperform x-vector's performance, and the covariance matrix estimate is effective for the MVDR beamformer.
Abstract: Deep neural network (DNN) embeddings for speaker recognition have recently attracted much attention. Compared to i-vectors, they are more robust to noise and room reverberation as DNNs leverage large-scale training. This article addresses the question of whether speech enhancement approaches are still useful when DNN embeddings are used for speaker recognition. We investigate single- and multi-channel speech enhancement for text-independent speaker verification based on x-vectors in conditions where strong diffuse noise and reverberation are both present. Single-channel (monaural) speech enhancement is based on complex spectral mapping and is applied to individual microphones. We use masking-based minimum variance distortion-less response (MVDR) beamformer and its rank-1 approximation for multi-channel speech enhancement. We propose a novel method of deriving time-frequency masks from the estimated complex spectrogram. In addition, we investigate gammatone frequency cepstral coefficients (GFCCs) as robust speaker features. Systematic evaluations and comparisons on the NIST SRE 2010 retransmitted corpus show that both monaural and multi-channel speech enhancement significantly outperform x-vector's performance, and our covariance matrix estimate is effective for the MVDR beamformer.

44 citations

Journal ArticleDOI
TL;DR: The outcomes of this research enhance the understanding of knowledge-based classifiers for authentication as well as the Gauss-Newton based optimization for vectorial inputs of spectrogram analysis.
Abstract: This paper deals with a novel frequency based authentication method and a Gauss-Newton based Neural Network classifier.The purpose of this research is to provide the foundations of frequency authentication to enhance keystroke authentication protocols.We presented short time Fourier transform to analyze the train signal of keystrokes.We also analyzed the spectrograms to discriminate various signals.EER of the proposed feature extraction and classification method is found as 4.1%. Keystroke recognition is one of the branch of biometrics that is designed to strengthen regular passwords through inter-key times to protect the password owner from fraud attacks. The signals of keystrokes are usually evaluated only in the time domain since the applied systems collect and analyze only the time values. In addition to these kinds of algorithms, we introduce the extraction of novel frequency feature and a keystroke authentication system which has a classifier operating in frequency domain. The frequency extraction is a new approach that will enhance the authentication protocols and shed light on the keystroke authentication by providing a hidden security level. Above all, instead of inter-key times, the exact key press times are extracted and binarized in time domain. Subsequently, the spectrograms are generated by regular short time Fourier transform with the optimized window size. Since the spectrograms include both frequency and time data, represented as images, low frequencies under a threshold are erased and the high frequencies are collected in bins after the digitization. Consequently the average bin values are used as the inputs to train the Gauss-Newton based Neural Network classifier to validate the attempts. The results are highly promising that we obtained 4.1% Equal Error Rate (EER) after 60 real attempts of the password owner and 60 fraud attacks from 12 different users. The outcomes of this research enhance our understanding of knowledge-based classifiers for authentication as well as the Gauss-Newton based optimization for vectorial inputs of spectrogram analysis.

44 citations

Journal ArticleDOI
TL;DR: This study proposes an approach using a dense CNN model for underwater target recognition that achieves the overall accuracy of 98.85% at 0-dB signal-to-noise ratio (SNR) and outperforms traditional ML techniques, as well as other state-of-the-art CNN models.
Abstract: In oceanic remote sensing operations, underwater acoustic target recognition is always a difficult and extremely important task of sonar systems, especially in the condition of complex sound wave propagation characteristics. The expensively learning recognition model for big data analysis is typically an obstacle for most traditional machine learning (ML) algorithms, whereas the convolutional neural network (CNN), a type of deep neural network, can automatically extract features for accurate classification. In this study, we propose an approach using a dense CNN model for underwater target recognition. The network architecture is designed to cleverly reuse all former feature maps to optimize classification rates under various impaired conditions while satisfying low computational cost. In addition, instead of using time-frequency spectrogram images, the proposed scheme allows directly utilizing the original audio signal in the time domain as the network input data. Based on the experimental results evaluated on the real-world data set of passive sonar, our classification model achieves the overall accuracy of 98.85% at 0-dB signal-to-noise ratio (SNR) and outperforms traditional ML techniques, as well as other state-of-the-art CNN models.

44 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
79% related
Convolutional neural network
74.7K papers, 2M citations
78% related
Feature extraction
111.8K papers, 2.1M citations
77% related
Wavelet
78K papers, 1.3M citations
76% related
Support vector machine
73.6K papers, 1.7M citations
75% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023627
20221,396
2021488
2020595
2019593