Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Continuous optical automatic speech recognition by lipreading

[...]

A.J. Goldschen¹, Oscar N. Garcia¹, Eric D. Petajan•Institutions (1)

George Washington University¹

31 Oct 1994

TL;DR: A continuous optical automatic speech recognizer that uses optical information from the oral-cavity shadow of a speaker that achieves a 25.3 percent recognition on sentences having a perplexity of 150 without using any syntactic, semantic, acoustic, or contextual guides is described.

...read moreread less

Abstract: We describe a continuous optical automatic speech recognizer (OASR) that uses optical information from the oral-cavity shadow of a speaker. The system achieves a 25.3 percent recognition on sentences having a perplexity of 150 without using any syntactic, semantic, acoustic, or contextual guides. We introduce 13, mostly dynamic, oral-cavity features used for optical recognition, present phones that appear optically similar (visemes) for our speaker, and present the recognition results for our hidden Markov models (HMMs) using visemes, trisemes, and generalized trisemes. We conclude that future research is warranted for optical recognition, especially when combined with other input modalities. >

...read moreread less

80 citations

Journal Article•DOI•

Joint Speaker Verification and Antispoofing in the $i$ -Vector Space

[...]

Aleksandr Sizov¹, Elie Khoury², Tomi Kinnunen¹, Zhizheng Wu³, Sébastien Marcel² - Show less +1 more•Institutions (3)

University of Eastern Finland¹, Idiap Research Institute², University of Edinburgh³

26 Feb 2015-IEEE Transactions on Information Forensics and Security

TL;DR: Back-end generative models for more generalized countermeasures are explored and synthesis-channel subspace is model to perform speaker verification and antispoofing jointly in the i-vector space, which is a well-established technique for speaker modeling.

...read moreread less

Abstract: Any biometric recognizer is vulnerable to spoofing attacks and hence voice biometric, also called automatic speaker verification (ASV), is no exception; replay, synthesis, and conversion attacks all provoke false acceptances unless countermeasures are used. We focus on voice conversion (VC) attacks considered as one of the most challenging for modern recognition systems. To detect spoofing, most existing countermeasures assume explicit or implicit knowledge of a particular VC system and focus on designing discriminative features. In this paper, we explore back-end generative models for more generalized countermeasures. In particular, we model synthesis-channel subspace to perform speaker verification and antispoofing jointly in the ${i}$ -vector space, which is a well-established technique for speaker modeling. It enables us to integrate speaker verification and antispoofing tasks into one system without any fusion techniques. To validate the proposed approach, we study vocoder-matched and vocoder-mismatched ASV and VC spoofing detection on the NIST 2006 speaker recognition evaluation data set. Promising results are obtained for standalone countermeasures as well as their combination with ASV systems using score fusion and joint approach.

...read moreread less

79 citations

Patent•DOI•

Fixed text speaker verification method and apparatus

[...]

Jayant M. Naik¹, George R. Doddington¹•Institutions (1)

Texas Instruments¹

03 Apr 1987-Journal of the Acoustical Society of America

TL;DR: Speaker verification is performed by computing principal components of a fixed text statement comprising a speaker identification code and a two-word phrase, and principal spectral components of an random word phrase.

...read moreread less

Abstract: Speaker verification is performed by computing principal components of a fixed text statement comprising a speaker identification code and a two-word phrase, and principal spectral components of a random word phrase. A multi-phrase strategy is utilized in access control to allow successive verification attempts in a single session, if the speaker fails initial attempts. Based upon a verification attempt, the system produces a verification score which is compared with a threshold value. On successive attempts, the criterion for acceptance is changed, and one of a number of criteria must be satisfied for acceptance in subsequent attempts. A speaker normalization function can also be invoked to modify the verification score of persons enrolled with the system who inherently produce scores which result in denial of access. Accuracy of the verification system is enhanced by updating the reference template which then more accurately symbolizes the person's speech signature.

...read moreread less

79 citations

Journal Article•DOI•

Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method

[...]

Mansour Sheikhan¹, Mahdi Bejani¹, Davood Gharavian•Institutions (1)

Islamic Azad University¹

01 Jul 2013-Neural Computing and Applications

TL;DR: In this paper, a modular neural-support vector machine (SVM) classifier is proposed, and its performance in emotion recognition is compared to Gaussian mixture model, multi-layer perceptron neural network, and C5.0-based classifiers.

...read moreread less

Abstract: The speech signal consists of linguistic information and also paralinguistic one such as emotion. The modern automatic speech recognition systems have achieved high performance in neutral style speech recognition, but they cannot maintain their high recognition rate for spontaneous speech. So, emotion recognition is an important step toward emotional speech recognition. The accuracy of an emotion recognition system is dependent on different factors such as the type and number of emotional states and selected features, and also the type of classifier. In this paper, a modular neural-support vector machine (SVM) classifier is proposed, and its performance in emotion recognition is compared to Gaussian mixture model, multi-layer perceptron neural network, and C5.0-based classifiers. The most efficient features are also selected by using the analysis of variations method. It is noted that the proposed modular scheme is achieved through a comparative study of different features and characteristics of an individual emotional state with the aim of improving the recognition performance. Empirical results show that even by discarding 22% of features, the average emotion recognition accuracy can be improved by 2.2%. Also, the proposed modular neural-SVM classifier improves the recognition accuracy at least by 8% as compared to the simulated monolithic classifiers.

...read moreread less

79 citations

Journal Article•DOI•

Advances in phone-based modeling for automatic accent classification

[...]

Pongtep Angkititrakul¹, John H. L. Hansen²•Institutions (2)

University of Texas at Austin¹, University of Colorado Boulder²

01 Dec 2006-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A text-independent automatic accent classification system using phone-based models is proposed and an experimental study is performed to compare the spectral trajectory model framework to a traditional hidden Markov model recognition framework using an accent sensitive word corpus.

...read moreread less

Abstract: It is suggested that algorithms capable of estimating and characterizing accent knowledge would provide valuable information in the development of more effective speech systems such as speech recognition, speaker identification, audio stream tagging in spoken document retrieval, channel monitoring, or voice conversion. Accent knowledge could be used for selection of alternative pronunciations in a lexicon, engage adaptation for acoustic modeling, or provide information for biasing a language model in large vocabulary speech recognition. In this paper, we propose a text-independent automatic accent classification system using phone-based models. Algorithm formulation begins with a series of experiments focused on capturing the spectral evolution information as potential accent sensitive cues. Alternative subspace representations using principal component analysis and linear discriminant analysis with projected trajectories are considered. Finally, an experimental study is performed to compare the spectral trajectory model framework to a traditional hidden Markov model recognition framework using an accent sensitive word corpus. System evaluation is performed using a corpus representing five English speaker groups with native American English, and English spoken with Mandarin Chinese, French, Thai, and Turkish accents for both male and female speakers.

...read moreread less

79 citations

Collapse

Network Information

Performance

Metrics

15,632

Papers

337,766

Citations

No. of papers in the topic in previous years
Year	Papers
2023	165
2022	468
2021	283
2020	475
2019	484
2018	420

Speaker recognition

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics