Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Speaker adaptation of context dependent deep neural networks

[...]

Hank Liao¹•Institutions (1)

Google¹

26 May 2013

TL;DR: This work explores how deep neural networks may be adapted to speakers by re-training the input layer, the output layer or the entire network, and looks at how L2 regularization using weight decay to the speaker independent model improves generalization.

...read moreread less

Abstract: There has been little work on examining how deep neural networks may be adapted to speakers for improved speech recognition accuracy. Past work has examined using a discriminatively trained affine transformation of the input features applied at a frame level or the re-training of the entire shallow network for a specific speaker. This work explores how deep neural networks may be adapted to speakers by re-training the input layer, the output layer or the entire network. We look at how L2 regularization using weight decay to the speaker independent model improves generalization. Other training factors are examined including the role momentum plays and stochastic mini-batch versus batch training. While improvements are significant for smaller networks, the largest show little gain from adaptation on a large vocabulary mobile speech recognition task.

...read moreread less

271 citations

Proceedings Article•DOI•

The Speakers in the Wild (SITW) Speaker Recognition Database.

[...]

Mitchell McLaren¹, Luciana Ferrer², Diego Castan³, Aaron Lawson¹•Institutions (3)

SRI International¹, National Scientific and Technical Research Council², University of Zaragoza³

08 Sep 2016

TL;DR: The Speakers in the Wild (SITW) speaker recognition database contains hand-annotated speech samples from open-source media for the purpose of benchmarking text-independent speaker recognition technology on single and multi-speaker audio acquired across unconstrained or “wild” conditions.

...read moreread less

Abstract: The Speakers in the Wild (SITW) speaker recognition database contains hand-annotated speech samples from open-source media for the purpose of benchmarking text-independent speaker recognition technology on single and multi-speaker audio acquired across unconstrained or “wild” conditions. The database consists of recordings of 299 speakers, with an average of eight different sessions per person. Unlike existing databases for speaker recognition, this data was not collected under controlled conditions and thus contains real noise, reverberation, intraspeaker variability and compression artifacts. These factors are often convolved in the real world, as the SITW data shows, and they make SITW a challenging database for singleand multispeaker recognition

...read moreread less

270 citations

Journal Article•DOI•

An Overview of Speaker Identification: Accuracy and Robustness Issues

[...]

Roberto Togneri, Daniel Pullella¹•Institutions (1)

University of Western Australia¹

09 Jun 2011-IEEE Circuits and Systems Magazine

TL;DR: The main paradigms for speaker identification, and recent work on missing data methods to increase robustness are presented, and combined approaches involving bottom-up estimation and top-down processing are reviewed.

...read moreread less

Abstract: This paper presents the main paradigms for speaker identification, and recent work on missing data methods to increase robustness. The feature extraction, speaker modeling and system classification are discussed. Evaluations of speaker identification performance subject to environmental noise are presented. While performance is impressive in clean speech conditions, there is rapid degradation with mismatched additive noise. Missing data methods can compensate against arbitrary disturbances and remove environmental mismatches. An overview of missing data methods is provided and applications to robust speaker identification summarized. Finally combined approaches involving bottom-up estimation and top-down processing are reviewed, and their significance discussed.

...read moreread less

269 citations

Journal Article•DOI•

Spectral Mapping Using Artificial Neural Networks for Voice Conversion

[...]

Srinivas Desai, Alan W. Black¹, B. Yegnanarayana, Kishore Prahallad•Institutions (1)

Carnegie Mellon University¹

01 Jul 2010-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A voice conversion approach using an ANN model to capture speaker-specific characteristics of a target speaker is proposed and it is demonstrated that such a voice Conversion approach can perform monolingual as well as cross-lingual voice conversion of an arbitrary source speaker.

...read moreread less

Abstract: In this paper, we use artificial neural networks (ANNs) for voice conversion and exploit the mapping abilities of an ANN model to perform mapping of spectral features of a source speaker to that of a target speaker. A comparative study of voice conversion using an ANN model and the state-of-the-art Gaussian mixture model (GMM) is conducted. The results of voice conversion, evaluated using subjective and objective measures, confirm that an ANN-based VC system performs as good as that of a GMM-based VC system, and the quality of the transformed speech is intelligible and possesses the characteristics of a target speaker. In this paper, we also address the issue of dependency of voice conversion techniques on parallel data between the source and the target speakers. While there have been efforts to use nonparallel data and speaker adaptation techniques, it is important to investigate techniques which capture speaker-specific characteristics of a target speaker, and avoid any need for source speaker's data either for training or for adaptation. In this paper, we propose a voice conversion approach using an ANN model to capture speaker-specific characteristics of a target speaker and demonstrate that such a voice conversion approach can perform monolingual as well as cross-lingual voice conversion of an arbitrary source speaker.

...read moreread less

269 citations

Proceedings Article•DOI•

Speaker adaptation through vector quantization

[...]

Kiyohiro Shikano¹, Kai-Fu Lee, Raj Reddy•Institutions (1)

Carnegie Mellon University¹

01 Apr 1986

TL;DR: Vector quantization (VQ) is a technique that reduces the computation amount and memory size drastically and is proposed in order to improve speaker-independent recognition.

...read moreread less

Abstract: Vector quantization (VQ) is a technique that reduces the computation amount and memory size drastically. In this paper, speaker adaptation algorithms through VQ are proposed in order to improve speaker-independent recognition. The speaker adaptation algorithms use VQ codebooks of a reference speaker and an input speaker. Speaker adaptation is performed by substituting vectors in the codebook of a reference speaker for vectors of the input speaker's codebook, or vice versa. To confirm the effectiveness of these algorithms, word recognition experiments are carried out using the IBM office correspondence task uttered by 11 speakers. The total number of words is 1174 for each speaker, and the number of different words is 422. The average word recognition rate using different speaker's reference through speaker adaptation is 80.9%, and the rate within the second choice is 92.0%.

...read moreread less

269 citations

Collapse

Network Information

Performance

Metrics

15,632

Papers

337,766

Citations

No. of papers in the topic in previous years
Year	Papers
2023	165
2022	468
2021	283
2020	475
2019	484
2018	420

Speaker recognition

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics