Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Automated extraction of odontocete whistle contours

[...]

Marie A. Roch¹, T. Scott Brandes, Bhavesh Patel, Yvonne Barkley, Simone Baumann-Pickering, Melissa S. Soldevilla - Show less +2 more•Institutions (1)

San Diego State University¹

03 Oct 2011-Journal of the Acoustical Society of America

TL;DR: This work develops and compares two algorithms on a common corpus of nearly one hour of data collected in the Southern California Bight and at Palmyra Atoll that use a common signal processing front end to determine time × frequency peaks from a spectrogram.

...read moreread less

Abstract: Many odontocetes produce frequency modulated tonal calls known as whistles. The ability to automatically determine time × frequency tracks corresponding to these vocalizations has numerous applications including species description, identification, and density estimation. This work develops and compares two algorithms on a common corpus of nearly one hour of data collected in the Southern California Bight and at Palmyra Atoll. The corpus contains over 3000 whistles from bottlenose dolphins, long- and short-beaked common dolphins, spinner dolphins, and melon-headed whales that have been annotated by a human, and released to the Moby Sound archive. Both algorithms use a common signal processing front end to determine time × frequency peaks from a spectrogram. In the first method, a particle filter performs Bayesian filtering, estimating the contour from the noisy spectral peaks. The second method uses an adaptive polynomial prediction to connect peaks into a graph, merging graphs when they cross. Whistle contours are extracted from graphs using information from both sides of crossings. The particle filter was able to retrieve 71.5% (recall) of the human annotated tonals with 60.8% of the detections being valid (precision). The graph algorithm's recall rate was 80.0% with a precision of 76.9%.

...read moreread less

72 citations

Journal Article•DOI•

Multi-Modal Multi-Channel Target Speech Separation

[...]

Rongzhi Gu¹, Shi-Xiong Zhang¹, Yong Xu¹, Lianwu Chen¹, Yuexian Zou², Dong Yu¹ - Show less +2 more•Institutions (2)

Tencent¹, Peking University²

16 Mar 2020-IEEE Journal of Selected Topics in Signal Processing

TL;DR: A general multi-modal framework for target speech separation is proposed by utilizing all the available information of the target speaker, including his/her spatial location, voice characteristics and lip movements, and a factorized attention-based fusion method is proposed to aggregate the high-level semantic information of multi- modalities at embedding level.

...read moreread less

Abstract: Target speech separation refers to extracting a target speaker's voice from an overlapped audio of simultaneous talkers. Previously the use of visual modality for target speech separation has demonstrated great potentials. This work proposes a general multi-modal framework for target speech separation by utilizing all the available information of the target speaker, including his/her spatial location, voice characteristics and lip movements. Also, under this framework, we investigate on the fusion methods for multi-modal joint modeling. A factorized attention-based fusion method is proposed to aggregate the high-level semantic information of multi-modalities at embedding level. This method firstly factorizes the mixture audio into a set of acoustic subspaces, then leverages the target's information from other modalities to enhance these subspace acoustic embeddings with a learnable attention scheme. To validate the robustness of proposed multi-modal separation model in practical scenarios, the system was evaluated under the condition that one of the modalities is temporarily missing, invalid or corrupted. Experiments are conducted on a large-scale audio-visual dataset collected from YouTube (to be released) that spatialized by simulated room impulse responses (RIRs). Experiment results illustrate that our proposed multi-modal framework significantly outperforms single-modal and bi-modal speech separation approaches, while can still support real-time processing.

...read moreread less

72 citations

Journal Article•DOI•

Fall Detection Using Smartphone Audio Features

[...]

Michael Cheffena¹•Institutions (1)

Gjøvik University College¹

01 Jul 2016-IEEE Journal of Biomedical and Health Informatics

TL;DR: An automated fall detection system based on smartphone audio features is developed and the best performance is achieved using spectrogram features with ANN classifier with sensitivity, specificity, and accuracy all above 98%.

...read moreread less

Abstract: An automated fall detection system based on smartphone audio features is developed. The spectrogram, mel frequency cepstral coefficents (MFCCs), linear predictive coding (LPC), and matching pursuit (MP) features of different fall and no-fall sound events are extracted from experimental data. Based on the extracted audio features, four different machine learning classifiers: k -nearest neighbor classifier ( k -NN), support vector machine (SVM), least squares method (LSM), and artificial neural network (ANN) are investigated for distinguishing between fall and no-fall events. For each audio feature, the performance of each classifier in terms of sensitivity, specificity, accuracy, and computational complexity is evaluated. The best performance is achieved using spectrogram features with ANN classifier with sensitivity, specificity, and accuracy all above $98\%$ . The classifier also has acceptable computational requirement for training and testing. The system is applicable in home environments where the phone is placed in the vicinity of the user.

...read moreread less

72 citations

Journal Article•DOI•

Investigation of Different CNN-Based Models for Improved Bird Sound Classification

[...]

Jie Xie¹, Kai Hu¹, Mingying Zhu², Jinghu Yu¹, Qibing Zhu¹ - Show less +1 more•Institutions (2)

Jiangnan University¹, University of Ottawa²

04 Dec 2019-IEEE Access

TL;DR: Experimental results on classifying 43 bird species show that fusing selected deep learning models can effectively increase the classification performance and selectively fuse them to further improve bird sound classification performance.

...read moreread less

Abstract: Automatic bird sound classification plays an important role in monitoring and further protecting biodiversity. Recent advances in acoustic sensor networks and deep learning techniques provide a novel way for continuously monitoring birds. Previous studies have proposed various deep learning based classification frameworks for recognizing and classifying birds. In this study, we compare different classification models and selectively fuse them to further improve bird sound classification performance. Specifically, we not only use the same deep learning architecture with different inputs but also employ two different deep learning architectures for constructing the fused model. Three types of time-frequency representations (TFRs) of bird sounds are investigated aiming to characterize different acoustic components of birds: Mel-spectrogram, harmonic-component based spectrogram, and percussive-component based spectrogram. In addition to different TFRs, a different deep learning architecture, SubSpectralNet, is employed to classify bird sounds. Experimental results on classifying 43 bird species show that fusing selected deep learning models can effectively increase the classification performance. Our best fused model can achieve a balanced accuracy of 86.31% and a weighted F1-score of 93.31%.

...read moreread less

72 citations

Book Chapter•DOI•

Online PLCA for real-time semi-supervised source separation

[...]

Zhiyao Duan¹, Gautham J. Mysore², Paris Smaragdis²•Institutions (2)

Northwestern University¹, Adobe Systems²

12 Mar 2012

TL;DR: An online approach is proposed to adaptively learn a dictionary for that source during the separation process and separate the mixture over time to perform online semi-supervised separation for real-time applications.

...read moreread less

Abstract: Non-negative spectrogram factorization algorithms such as probabilistic latent component analysis (PLCA) have been shown to be quite powerful for source separation. When training data for all of the sources are available, it is trivial to learn their dictionaries beforehand and perform supervised source separation in an online fashion. However, in many real-world scenarios (e.g. speech denoising), training data for one of the sources can be hard to obtain beforehand (e.g. speech). In these cases, we need to perform semi-supervised source separation and learn a dictionary for that source during the separation process. Existing semi-supervised separation approaches are generally offline, i.e. they need to access the entire mixture when updating the dictionary. In this paper, we propose an online approach to adaptively learn this dictionary and separate the mixture over time. This enables us to perform online semi-supervised separation for real-time applications. We demonstrate this approach on real-time speech denoising.

...read moreread less

71 citations

Collapse

Network Information

Performance

Metrics

7,848

Papers

107,060

Citations

No. of papers in the topic in previous years
Year	Papers
2024	1
2023	627
2022	1,396
2021	488
2020	595
2019	593

Spectrogram

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics