Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Book•

Speech Time-Frequency Representations

[...]

Michael D. Riley

31 Jan 1989

TL;DR: In this article, the authors proposed a time frequency energy representation for speech and showed that the representation can be used for signal detection and ridge identification in the stationary case and the quasi-stationary case.

...read moreread less

Abstract: 1 Introduction.- 2 The Time-Frequency Energy Representation.- 2.1. The stationary case.- 2.2. The quasi-stationary case.- 2.3. Non-stationarity.- 2.4. Joint time-frequency representations.- 2.5. Design criteria for time-frequency representations.- 2.6. Relations among the design criteria.- 2.7. Satisfying the design criteria.- 2.8. Directional time-frequency transforms.- 2.9. A speech example.- 3 Time-Frequency Filtering.- 3.1. The stationary case.- 3.2. Non-stationary vocal tract.- 3.3. Time-frequency filtering.- 3.4. The stationary case - re-examined.- 3.5. Linearly varying modulation frequency.- 3.6. The quasi-stationary case.- 3.7. Smoothly varying modulation frequency.- 3.8. The vocal tract transfer function.- 3.9. The transmission channel.- 3.10. The excitation.- 4 The Schematic Spectrogram.- 4.1. Rationale.- 4.2. Spectral Peaks.- 4.3. Time-frequency ridges - non-directional kernel.- 4.4. Time-frequency ridges - directional kernel.- 4.5. Signal detection and ridge identification.- 4.6. Continuity and grouping.- 4.7. A perspective.- 5 A Catalog of Examples.- 5.1. Some general examples.- 5.2. Liquids and glides.- 5.3. Nasalized vowels.- 5.4. Consonant-vowel transitions.- 5.5. Female speech.- 5.6. Transmission channel effects.- References.

...read moreread less

44 citations

Journal Article•DOI•

Perceptual wavelet-representation of speech signals and its application to speech enhancement

[...]

István Pintér

01 Jan 1996-Computer Speech & Language

TL;DR: Although the proposed transform has been derived heuristically—namely, to be optimal in the perceptual frequency scale in Gabor-sense and to perform a 1 CB speech analysis—it appears that this is a self-invertible, overcomplete, shiftable transform.

...read moreread less

44 citations

Journal Article•DOI•

Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement

[...]

Hassan Taherian¹, Zhong-Qiu Wang¹, Jorge Chang¹, DeLiang Wang¹•Institutions (1)

Ohio State University¹

13 Apr 2020-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: Systematic evaluations and comparisons on the NIST SRE 2010 retransmitted corpus show that both monaural and multi-channel speech enhancement significantly outperform x-vector's performance, and the covariance matrix estimate is effective for the MVDR beamformer.

...read moreread less

Abstract: Deep neural network (DNN) embeddings for speaker recognition have recently attracted much attention. Compared to i-vectors, they are more robust to noise and room reverberation as DNNs leverage large-scale training. This article addresses the question of whether speech enhancement approaches are still useful when DNN embeddings are used for speaker recognition. We investigate single- and multi-channel speech enhancement for text-independent speaker verification based on x-vectors in conditions where strong diffuse noise and reverberation are both present. Single-channel (monaural) speech enhancement is based on complex spectral mapping and is applied to individual microphones. We use masking-based minimum variance distortion-less response (MVDR) beamformer and its rank-1 approximation for multi-channel speech enhancement. We propose a novel method of deriving time-frequency masks from the estimated complex spectrogram. In addition, we investigate gammatone frequency cepstral coefficients (GFCCs) as robust speaker features. Systematic evaluations and comparisons on the NIST SRE 2010 retransmitted corpus show that both monaural and multi-channel speech enhancement significantly outperform x-vector's performance, and our covariance matrix estimate is effective for the MVDR beamformer.

...read moreread less

44 citations

Journal Article•DOI•

Frequency spectrograms for biometric keystroke authentication using neural network based classifier

[...]

Orcan Alpar¹•Institutions (1)

University of Hradec Králové¹

15 Jan 2017-Knowledge Based Systems

TL;DR: The outcomes of this research enhance the understanding of knowledge-based classifiers for authentication as well as the Gauss-Newton based optimization for vectorial inputs of spectrogram analysis.

...read moreread less

Abstract: This paper deals with a novel frequency based authentication method and a Gauss-Newton based Neural Network classifier.The purpose of this research is to provide the foundations of frequency authentication to enhance keystroke authentication protocols.We presented short time Fourier transform to analyze the train signal of keystrokes.We also analyzed the spectrograms to discriminate various signals.EER of the proposed feature extraction and classification method is found as 4.1%. Keystroke recognition is one of the branch of biometrics that is designed to strengthen regular passwords through inter-key times to protect the password owner from fraud attacks. The signals of keystrokes are usually evaluated only in the time domain since the applied systems collect and analyze only the time values. In addition to these kinds of algorithms, we introduce the extraction of novel frequency feature and a keystroke authentication system which has a classifier operating in frequency domain. The frequency extraction is a new approach that will enhance the authentication protocols and shed light on the keystroke authentication by providing a hidden security level. Above all, instead of inter-key times, the exact key press times are extracted and binarized in time domain. Subsequently, the spectrograms are generated by regular short time Fourier transform with the optimized window size. Since the spectrograms include both frequency and time data, represented as images, low frequencies under a threshold are erased and the high frequencies are collected in bins after the digitization. Consequently the average bin values are used as the inputs to train the Gauss-Newton based Neural Network classifier to validate the attempts. The results are highly promising that we obtained 4.1% Equal Error Rate (EER) after 60 real attempts of the password owner and 60 fraud attacks from 12 different users. The outcomes of this research enhance our understanding of knowledge-based classifiers for authentication as well as the Gauss-Newton based optimization for vectorial inputs of spectrogram analysis.

...read moreread less

44 citations

Journal Article•DOI•

Underwater Acoustic Target Classification Based on Dense Convolutional Neural Network

[...]

Van-Sang Doan¹, Thien Huynh-The¹, Dong-Seong Kim¹•Institutions (1)

Kumoh National Institute of Technology¹

19 Oct 2020-IEEE Geoscience and Remote Sensing Letters

TL;DR: This study proposes an approach using a dense CNN model for underwater target recognition that achieves the overall accuracy of 98.85% at 0-dB signal-to-noise ratio (SNR) and outperforms traditional ML techniques, as well as other state-of-the-art CNN models.

...read moreread less

Abstract: In oceanic remote sensing operations, underwater acoustic target recognition is always a difficult and extremely important task of sonar systems, especially in the condition of complex sound wave propagation characteristics. The expensively learning recognition model for big data analysis is typically an obstacle for most traditional machine learning (ML) algorithms, whereas the convolutional neural network (CNN), a type of deep neural network, can automatically extract features for accurate classification. In this study, we propose an approach using a dense CNN model for underwater target recognition. The network architecture is designed to cleverly reuse all former feature maps to optimize classification rates under various impaired conditions while satisfying low computational cost. In addition, instead of using time-frequency spectrogram images, the proposed scheme allows directly utilizing the original audio signal in the time domain as the network input data. Based on the experimental results evaluated on the real-world data set of passive sonar, our classification model achieves the overall accuracy of 98.85% at 0-dB signal-to-noise ratio (SNR) and outperforms traditional ML techniques, as well as other state-of-the-art CNN models.

...read moreread less

44 citations

Collapse

Network Information

Performance

Metrics

7,848

Papers

107,060

Citations

No. of papers in the topic in previous years
Year	Papers
2024	1
2023	627
2022	1,396
2021	488
2020	595
2019	593

Spectrogram

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics