Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Analysis of Head Gesture and Prosody Patterns for Prosody-Driven Head-Gesture Animation

[...]

M.E. Sargin¹, Yücel Yemez, Engin Erzin, A.M. Tekalp•Institutions (1)

University of California, Santa Barbara¹

01 Aug 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Object and subjective evaluations indicate that the proposed synthesis by analysis scheme provides natural looking head gestures for the speaker with any input test speech, as well as in ``prosody transplant" and ``gesture transplant" scenarios.

...read moreread less

Abstract: We propose a new two-stage framework for joint analysis of head gesture and speech prosody patterns of a speaker towards automatic realistic synthesis of head gestures from speech prosody. In the first stage analysis, we perform hidden Markov model (HMM) based unsupervised temporal segmentation of head gesture and speech prosody features separately to determine elementary head gesture and speech prosody patterns, respectively, for a particular speaker. In the second stage, joint analysis of correlations between these elementary head gesture and prosody patterns is performed using Multi-Stream HMMs to determine an audio-visual mapping model. The resulting audio-visual mapping model is then employed to synthesize natural head gestures from arbitrary input test speech given a head model for the speaker. In the synthesis stage, the audio-visual mapping model is used to predict a sequence of gesture patterns from the prosody pattern sequence computed for the input test speech. The Euler angles associated with each gesture pattern are then applied to animate the speaker head model. Objective and subjective evaluations indicate that the proposed synthesis by analysis scheme provides natural looking head gestures for the speaker with any input test speech, as well as in ``prosody transplant" and ``gesture transplant" scenarios.

...read moreread less

73 citations

Proceedings Article•DOI•

Mel-cepstrum-based steganalysis for VoIP steganography

[...]

Christian Kraetzer¹, Jana Dittmann¹•Institutions (1)

Otto-von-Guericke University Magdeburg¹

01 Mar 2007

TL;DR: In this article, a Mel-cepstrum-based analysis known from speaker and speech recognition is used to perform a detection of embedded hidden messages in VoIP applications, which can detect information hiding in the field of hidden communication as well as for DRM applications.

...read moreread less

Abstract: Steganography and steganalysis in VoIP applications are important research topics as speech data is an appropriate cover to hide messages or comprehensive documents. In our paper we introduce a Mel-cepstrum based analysis known from speaker and speech recognition to perform a detection of embedded hidden messages. In particular we combine known and established audio steganalysis features with the features derived from Melcepstrum based analysis for an investigation on the improvement of the detection performance. Our main focus considers the application environment of VoIP-steganography scenarios. The evaluation of the enhanced feature space is performed for classical steganographic as well as for watermarking algorithms. With this strategy we show how general forensic approaches can detect information hiding techniques in the field of hidden communication as well as for DRM applications. For the later the detection of the presence of a potential watermark in a specific feature space can lead to new attacks or to a better design of the watermarking pattern. Following that the usefulness of Mel-cepstrum domain based features for detection is discussed in detail.

...read moreread less

73 citations

Journal Article•DOI•

Speech enhancement with EMD and hurst-based mode selection

[...]

L. Zao¹, Rosângela Coelho¹, P. Flandrin²•Institutions (2)

Instituto Militar de Engenharia¹, École normale supérieure de Lyon²

01 May 2014-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The proposed speech enhancement technique for signals corrupted by nonstationary acoustic noises applies the empirical mode decomposition to the noisy speech signal and obtains a set of intrinsic mode functions (IMF) and adopts the Hurst exponent in the selection of IMFs to reconstruct the speech.

...read moreread less

Abstract: This paper presents a speech enhancement technique for signals corrupted by nonstationary acoustic noises. The proposed approach applies the empirical mode decomposition (EMD) to the noisy speech signal and obtains a set of intrinsic mode functions (IMF). The main contribution of the proposed procedure is the adoption of the Hurst exponent in the selection of IMFs to reconstruct the speech. This EMD and Hurst-based (EMDH) approach is evaluated in speech enhancement experiments considering environmental acoustic noises with different indices of nonstationarity. The results show that the EMDH improves the segmental signal-to-noise ratio and an overall quality composite measure, encompassing the perceptual evaluation of speech quality (PESQ). Moreover, the short-time objective intelligibility (STOI) measure reinforces the superior performance of EMDH. Finally, the EMDH is also examined in a speaker identification task in noisy conditions. The proposed technique leads to the highest speaker identification rates when compared to the baseline speech enhancement algorithms and also to a multicondition training procedure.

...read moreread less

73 citations

Book Chapter•DOI•

Higher-Level Features in Speaker Recognition

[...]

Elizabeth Shriberg¹•Institutions (1)

International Computer Science Institute¹

01 Feb 2007

TL;DR: This article briefly summarizes approaches to using higher-level features for text-independent speaker verification over the last decade in terms of their type, temporal span, and reliance on automatic speech recognition for both feature extraction and feature conditioning.

...read moreread less

Abstract: Higher-level features based on linguistic or long-range information have attracted significant attention in automatic speaker recognition. This article briefly summarizes approaches to using higher-level features for text-independent speaker verification over the last decade. To clarify how each approach uses higher-level information, features are described in terms of their type, temporal span, and reliance on automatic speech recognition for both feature extractionand feature conditioning. A subsequent analysis of higher-level features in a state-of-the-art system illustrates that (1) a higher-level cepstral system outperforms standard systems, (2) a prosodic system shows excellent performance individually and in combination, (3) other higher-level systems provide further gains, and (4) higher-level systems provide increasing relative gains as training data increases. Implications for the general field of speaker classification are discussed.

...read moreread less

73 citations

Proceedings Article•DOI•

Speaker identification and clustering using convolutional neural networks

[...]

Yanick Xavier Lukic¹, Carlo Vogt¹, Oliver Dürr¹, Thilo Stadelmann¹•Institutions (1)

Zürcher Fachhochschule¹

01 Sep 2016

TL;DR: This paper uses simple spectrograms as input to a CNN and study the optimal design of those networks for speaker identification and clustering, and demonstrates the approach on the well known TIMIT dataset, achieving results comparable with the state of the art-without the need for handcrafted features.

...read moreread less

Abstract: Deep learning, especially in the form of convolutional neural networks (CNNs), has triggered substantial improvements in computer vision and related fields in recent years. This progress is attributed to the shift from designing features and subsequent individual sub-systems towards learning features and recognition systems end to end from nearly unprocessed data. For speaker clustering, however, it is still common to use handcrafted processing chains such as MFCC features and GMM-based models. In this paper, we use simple spectrograms as input to a CNN and study the optimal design of those networks for speaker identification and clustering. Furthermore, we elaborate on the question how to transfer a network, trained for speaker identification, to speaker clustering. We demonstrate our approach on the well known TIMIT dataset, achieving results comparable with the state of the art-without the need for handcrafted features.

...read moreread less

73 citations

Collapse

Network Information

Performance

Metrics

15,632

Papers

337,766

Citations

No. of papers in the topic in previous years
Year	Papers
2023	165
2022	468
2021	283
2020	475
2019	484
2018	420

Speaker recognition

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics