Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR

[...]

Masatsune Tamura¹, Takashi Masuko, Keiichi Tokuda, Takao Kobayashi•Institutions (1)

Tokyo Institute of Technology¹

07 May 2001

TL;DR: It is demonstrated that a few sentences uttered by a target speaker are sufficient to adapt not only voice characteristics but also prosodic features, and synthetic speech generated from adapted models using only four sentences is very close to that from speaker dependent models trained using 450 sentences.

...read moreread less

Abstract: Describes a technique for synthesizing speech with arbitrary speaker characteristics using speaker independent speech units, which we call "average voice" units. The technique is based on an HMM-based text-to-speech (TTS) system and maximum likelihood linear regression (MLLR) adaptation algorithm. In the HMM-based TTS system, speech synthesis units are modeled by multi-space probability distribution (MSD) HMMs which can model spectrum and pitch simultaneously in a unified framework. We derive an extension of the MLLR algorithm to apply it to MSD-HMMs. We demonstrate that a few sentences uttered by a target speaker are sufficient to adapt not only voice characteristics but also prosodic features. Synthetic speech generated from adapted models using only four sentences is very close to that from speaker dependent models trained using 450 sentences.

...read moreread less

158 citations

Patent•DOI•

Voice personalization of speech synthesizer

[...]

Jean-Claude Junqua¹, Florent Perronnin¹, Roland Kuhn¹, Patrick Nguyen¹•Institutions (1)

Panasonic¹

25 Feb 2002-Journal of the Acoustical Society of America

TL;DR: In this paper, a speaker provides a quantity of enrollment data (18), which can be extracted from a short quantity of speech, and the system modifies the base synthesis parameters (12) to more closely resemble those of the new speaker.

...read moreread less

Abstract: The speech synthesizer is personalized to sound like or mimic the speech characteristics of an individual speaker. The individual speaker provides a quantity of enrollment data (18), which can be extracted from a short quantity of speech, and the system modifies the base synthesis parameters (12) to more closely resemble those of the new speaker (36). More specifically, the synthesis parameters (12) may be decomposed into speaker dependent parameters (30), such as context-independent parameters, and speaker independent parameters (32), such as contextindependent parameters, and speaker independent parameters (32), such as context dependent parameters. The speaker dependent parameters (30) are adapted using enrollment data (18) from the new speaker. After adaptation, the speaker dependent parameters (30) are combined with the speaker independent parameters (32) to provide a set of personalized synthesis parameters (42).

...read moreread less

157 citations

Journal Article•DOI•

Fast adaptation of deep neural network based on discriminant codes for speech recognition

[...]

Shaofei Xue¹, Ossama Abdel-Hamid², Hui Jiang², Li-Rong Dai¹, Qingfeng Liu¹ - Show less +1 more•Institutions (2)

University of Science and Technology of China¹, York University²

01 Dec 2014-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A general adaptation scheme for DNN based on discriminant condition codes is proposed, which is directly fed to various layers of a pre-trained DNN through a new set of connection weights, which are quite effective to adapt large DNN models using only a small amount of adaptation data.

...read moreread less

Abstract: Fast adaptation of deep neural networks (DNN) is an important research topic in deep learning. In this paper, we have proposed a general adaptation scheme for DNN based on discriminant condition codes, which are directly fed to various layers of a pre-trained DNN through a new set of connection weights. Moreover, we present several training methods to learn connection weights from training data as well as the corresponding adaptation methods to learn new condition code from adaptation data for each new test condition. In this work, the fast adaptation scheme is applied to supervised speaker adaptation in speech recognition based on either frame-level cross-entropy or sequence-level maximum mutual information training criterion. We have proposed three different ways to apply this adaptation scheme based on the so-called speaker codes: i) Nonlinear feature normalization in feature space; ii) Direct model adaptation of DNN based on speaker codes; iii) Joint speaker adaptive training with speaker codes. We have evaluated the proposed adaptation methods in two standard speech recognition tasks, namely TIMIT phone recognition and large vocabulary speech recognition in the Switchboard task. Experimental results have shown that all three methods are quite effective to adapt large DNN models using only a small amount of adaptation data. For example, the Switchboard results have shown that the proposed speaker-code-based adaptation methods may achieve up to 8-10% relative error reduction using only a few dozens of adaptation utterances per speaker. Finally, we have achieved very good performance in Switchboard (12.1% in WER) after speaker adaptation using sequence training criterion, which is very close to the best performance reported in this task ("Deep convolutional neural networks for LVCSR," T. N. Sainath et al., Proc. IEEE Acoust., Speech, Signal Process., 2013).

...read moreread less

157 citations

Proceedings Article•DOI•

On the automatic segmentation of speech signals

[...]

Torbjørn Svendsen¹, F. Soong•Institutions (1)

Bell Labs¹

06 Apr 1987

TL;DR: Three different approaches for automatically segmenting speech into phonetic units are described, onebased on template matching, one based on detecting the spectral changes that occur at the boundaries between phoneticunits and one based upon a constrained-clustering vector quantization approach.

...read moreread less

Abstract: For large vocabulary and continuous speech recognition, the sub-word-unit-based approach is a viable alternative to the whole-word-unit-based approach. For preparing a large inventory of subword units, an automatic segmentation is preferrable to manual segmentation as it substantially reduces the work associated with the generation of templates and gives more consistent results. In this paper we discuss some methods for automatically segmenting speech into phonetic units. Three different approaches are described, one based on template matching, one based on detecting the spectral changes that occur at the boundaries between phonetic units and one based on a constrained-clustering vector quantization approach. An evaluation of the performance of the automatic segmentation methods is given.

...read moreread less

156 citations

Channel compensation for SVM speaker recognition.

[...]

Alex Solomonoff¹, Carl Quillen, William M. Campbell•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2004

TL;DR: This paper explores techniques that are specific to the SVM framework in order to derive fully non-linear channel compensations, resulting in a system that is less sensitive to specific kinds of labeled channel variations observed in training.

...read moreread less

Abstract: One of the major remaining challenges to improving accuracy in state-of-the-art speaker recognition algorithms is reducing the impact of channel and handset variations on system performance. For Gaussian Mixture Model based speaker recognition systems, a variety of channel-adaptation techniques are known and available for adapting models between different channel conditions, but for the much more recent Support Vector Machine (SVM) based approaches to this problem, much less is known about the best way to handle this issue. In this paper we explore techniques that are specific to the SVM framework in order to derive fully non-linear channel compensations. The result is a system that is less sensitive to specific kinds of labeled channel variations observed in training.

...read moreread less

156 citations

Collapse

Network Information

Performance

Metrics

15,632

Papers

337,766

Citations

No. of papers in the topic in previous years
Year	Papers
2023	165
2022	468
2021	283
2020	475
2019	484
2018	420

Speaker recognition

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics