Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Automatic recognition of fin and blue whale calls for real-time monitoring in the St. Lawrence.

[...]

Xavier Mouy¹, Mohammed Bahoura, Yvan Simard•Institutions (1)

Université du Québec¹

14 Dec 2009-Journal of the Acoustical Society of America

TL;DR: Three time-frequency methods aiming at automatic detection and classification of blue and fin whales summering in the St. Lawrence Estuary with passive acoustics are tested and compared at several levels of signal-to-noise ratio.

...read moreread less

Abstract: Monitoring blue and fin whales summering in the St. Lawrence Estuary with passive acoustics requires call recognition algorithms that can cope with the heavy shipping noise of the St. Lawrence Seaway and with multipath propagation characteristics that generate overlapping copies of the calls. In this paper, the performance of three time-frequency methods aiming at such automatic detection and classification is tested on more than 2000 calls and compared at several levels of signal-to-noise ratio using typical recordings collected in this area. For all methods, image processing techniques are used to reduce the noise in the spectrogram. The first approach consists in matching the spectrogram with binary time-frequency templates of the calls (coincidence of spectrograms). The second approach is based on the extraction of the frequency contours of the calls and their classification using dynamic time warping (DTW) and the vector quantization (VQ) algorithms. The coincidence of spectrograms was the fastest method and performed better for blue whale A and B calls. VQ detected more 20 Hz fin whale calls but with a higher false alarm rate. DTW and VQ outperformed for the more variable blue whale D calls.

...read moreread less

36 citations

Proceedings Article•DOI•

Parallel Tacotron: Non-Autoregressive and Controllable TTS

[...]

Isaac Elias¹, Heiga Zen¹, Jonathan Shen¹, Yu Zhang¹, Ye Jia¹, Ron Weiss¹, Yonghui Wu¹ - Show less +3 more•Institutions (1)

Google¹

06 Jun 2021

TL;DR: Parallel Tacotron as mentioned in this paper uses a variational autoencoder-based residual encoder for text-to-speech models, which is highly parallelizable during both training and inference.

...read moreread less

Abstract: Although neural end-to-end text-to-speech models can synthesize highly natural speech, there is still room for improvements to its efficiency and naturalness. This paper proposes a non-autoregressive neural text-to-speech model augmented with a variational autoencoder-based residual encoder. This model, called Parallel Tacotron, is highly parallelizable during both training and inference, allowing efficient synthesis on modern parallel hardware. The use of the variational autoencoder relaxes the one-to-many mapping nature of the text-to-speech problem and improves naturalness. To further improve the naturalness, we use lightweight convolutions, which can efficiently capture local contexts, and introduce an iterative spectrogram loss inspired by iterative refinement. Experimental results show that Parallel Tacotron matches a strong autoregressive baseline in subjective evaluations with significantly decreased inference time.

...read moreread less

35 citations

Proceedings Article•DOI•

Role of Phase Estimation in Speech Enhancement

[...]

Ben James Shannon¹, Kuldip K. Paliwal•Institutions (1)

Griffith University¹

17 Sep 2006

TL;DR: Noise reduction can be achieved by modifying the window function used to estimate the STFT phase spectra using a modified STFT Analysis-Modification-Synthesis (AMS) framework.

...read moreread less

Abstract: Typical speech enhancement algorithms that operate in the Fourier domain only modify the magnitude component. It is commonly understood that the phase component is perceptually unimportant, and thus, it is passed directly to the output. In recent intelligibility experiments, it has been reported that the Short-Time Fourier Transform (STFT) phase spectrum can provide significant intelligibility when estimated using a window function lower in dynamic range than the typical Hamming window. Motivated by this, we investigate the role of the window function for STFT phase estimation in relation to speech enhancement. Using a modified STFT Analysis-Modification-Synthesis (AMS) framework, we show that noise reduction can be achieved by modifying the window function used to estimate the STFT phase spectra. We demonstrate this through spectrogram plots and results from two objective speech quality measures.

...read moreread less

35 citations

Journal Article•DOI•

Drill Fault Diagnosis Based on the Scalogram and Mel Spectrogram of Sound Signals Using Artificial Intelligence

[...]

Thanh Tran¹, Jan Lundgren¹•Institutions (1)

Mid Sweden University¹

09 Nov 2020-IEEE Access

TL;DR: This research aims to develop a drill fault detection system using state-of-the-art artificial intelligence techniques that uses deep learning architecture to extract rich features from the image representation of sound signals combined with machine learning classifiers to classify drill fault sounds of drilling machines.

...read moreread less

Abstract: In industry, the ability to detect damage or abnormal functioning in machinery is very important. However, manual detection of machine fault sound is economically inefficient and labor-intensive. Hence, automatic machine fault detection (MFD) plays an important role in reducing operating and personnel costs compared to manual machine fault detection. This research aims to develop a drill fault detection system using state-of-the-art artificial intelligence techniques. Many researchers have applied the traditional approach design for an MFD system, including handcrafted feature extraction of the raw sound signal, feature selection, and conventional classification. However, drill sound fault detection based on conventional machine learning methods using the raw sound signal in the time domain faces a number of challenges. For example, it can be difficult to extract and select good features to input in a classifier, and the accuracy of fault detection may not be sufficient to meet industrial requirements. Hence, we propose a method that uses deep learning architecture to extract rich features from the image representation of sound signals combined with machine learning classifiers to classify drill fault sounds of drilling machines. The proposed methods are trained and evaluated using the real sound dataset provided by the factory. The experiment results show a good classification accuracy of 80.25 percent when using Mel spectrogram and scalogram images. The results promise significant potential for using in the fault diagnosis support system based on the sounds of drilling machines.

...read moreread less

35 citations

Proceedings Article•DOI•

Towards direct speech synthesis from ECoG: A pilot study

[...]

Christian Herff¹, Garett D. Johnson², Lorenz Diener¹, Jerry J. Shih³, Dean J. Krusienski², Tanja Schultz¹ - Show less +2 more•Institutions (3)

University of Bremen¹, Old Dominion University², Mayo Clinic³

01 Aug 2016

TL;DR: It is demonstrated that electrocoticography (ECoG) intracranial activity from temporal areas can be used to resynthesize speech in real-time and significant correlations between the original and reconstructed spectrograms and temporal waveforms can be achieved.

...read moreread less

Abstract: Most current Brain-Computer Interfaces (BCIs) achieve high information transfer rates using spelling paradigms based on stimulus-evoked potentials. Despite the success of this interfaces, this mode of communication can be cumbersome and unnatural. Direct synthesis of speech from neural activity represents a more natural mode of communication that would enable users to convey verbal messages in real-time. In this pilot study with one participant, we demonstrate that electrocoticography (ECoG) intracranial activity from temporal areas can be used to resynthesize speech in real-time. This is accomplished by reconstructing the audio magnitude spectrogram from neural activity and subsequently creating the audio waveform from these reconstructed spectrograms. We show that significant correlations between the original and reconstructed spectrograms and temporal waveforms can be achieved. While this pilot study uses audibly spoken speech for the models, it represents a first step towards speech synthesis from speech imagery.

...read moreread less

35 citations

Collapse

Network Information

Performance

Metrics

7,848

Papers

107,060

Citations

No. of papers in the topic in previous years
Year	Papers
2024	1
2023	627
2022	1,396
2021	488
2020	595
2019	593

Spectrogram

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics