scispace - formally typeset
Search or ask a question
Author

J.M. Naik

Bio: J.M. Naik is an academic researcher. The author has contributed to research in topics: Speaker recognition & Speaker diarisation. The author has an hindex of 1, co-authored 1 publications receiving 144 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The task of speaker verification, a subset of the general problem of speaker recognition, is defined and the feature selection and pattern matching steps of the recognition procedure are examined.
Abstract: The task of speaker verification, a subset of the general problem of speaker recognition is defined. The feature selection and pattern matching steps of the recognition procedure are examined. Speaker verification system design and performance are discussed, and databases for evaluating them are briefly considered. An example of a speaker verification system is described. An overview of industry research in this area is given. >

146 citations


Cited by
More filters
Journal ArticleDOI
01 Sep 1997
TL;DR: A tutorial on the design and development of automatic speaker-recognition systems is presented and a new automatic speakers recognition system is given that performs with 98.9% correct decalcification.
Abstract: A tutorial on the design and development of automatic speaker-recognition systems is presented. Automatic speaker recognition is the use of a machine to recognize a person from a spoken phrase. These systems can operate in two modes: to identify a particular person or to verify a person's claimed identity. Speech processing and the basic components of automatic speaker-recognition systems are shown and design tradeoffs are discussed. Then, a new automatic speaker-recognition system is given. This recognizer performs with 98.9% correct decalcification. Last, the performances of various systems are compared.

1,686 citations

Journal ArticleDOI
TL;DR: A comparative study of human versus machine speaker recognition is concluded, with an emphasis on prominent speaker-modeling techniques that have emerged in the last decade for automatic systems.
Abstract: Identifying a person by his or her voice is an important human trait most take for granted in natural human-to-human interaction/communication. Speaking to someone over the telephone usually begins by identifying who is speaking and, at least in cases of familiar speakers, a subjective verification by the listener that the identity is correct and the conversation can proceed. Automatic speaker-recognition systems have emerged as an important means of verifying identity in many e-commerce applications as well as in general business interactions, forensics, and law enforcement. Human experts trained in forensic speaker recognition can perform this task even better by examining a set of acoustic, prosodic, and linguistic characteristics of speech in a general approach referred to as structured listening. Techniques in forensic speaker recognition have been developed for many years by forensic speech scientists and linguists to help reduce any potential bias or preconceived understanding as to the validity of an unknown audio sample and a reference template from a potential suspect. Experienced researchers in signal processing and machine learning continue to develop automatic algorithms to effectively perform speaker recognition?with ever-improving performance?to the point where automatic systems start to perform on par with human listeners. In this article, we review the literature on speaker recognition by machines and humans, with an emphasis on prominent speaker-modeling techniques that have emerged in the last decade for automatic systems. We discuss different aspects of automatic systems, including voice-activity detection (VAD), features, speaker models, standard evaluation data sets, and performance metrics. Human speaker recognition is discussed in two parts?the first part involves forensic speaker-recognition methods, and the second illustrates how a na?ve listener performs this task from a neuroscience perspective. We conclude this review with a comparative study of human versus machine speaker recognition and attempt to point out strengths and weaknesses of each.

554 citations

Proceedings ArticleDOI
14 May 2001
TL;DR: The technique is sufficiently robust to enable the user to reliably regenerate the key by uttering her password again, and an empirical evaluation of this technique is described using 250 utterances recorded from 50 users.
Abstract: We propose a technique to reliably generate a cryptographic key from a user's voice while speaking a password. The key resists cryptanalysis even against an attacker who captures all system information related to generating or verifying the cryptographic key. Moreover, the technique is sufficiently robust to enable the user to reliably regenerate the key by uttering her password again. We describe an empirical evaluation of this technique using 250 utterances recorded from 50 users.

374 citations

Proceedings ArticleDOI
27 Apr 1993
TL;DR: Methods that create models to specify both speaker and phonetic information accurately by using only a small amount of training data for each speaker are investigated and supplementing these methods by adding a phoneme-independent speaker model to make up for the lack of speaker information.
Abstract: Methods that create models to specify both speaker and phonetic information accurately by using only a small amount of training data for each speaker are investigated. For a text-dependent speaker recognition method, in which arbitrary key texts are prompted from the recognizer, speaker-specific phoneme models are necessary to identify the key text and recognize the speaker. Two methods of making speaker-specific phoneme models are discussed: phoneme-adaptation of a phoneme-independent speaker model and speaker-adaptation of universal phoneme models. The authors also investigate supplementing these methods by adding a phoneme-independent speaker model to make up for the lack of speaker information. This combination achieves a rejection rate as high as 98.5% for speech that differs from the key text and a speaker verification rate of 100.0%. >

189 citations

PatentDOI
TL;DR: In this paper, a facility is provided for allowing a caller to place a telephone call by uttering a label identifying a desired called destination and to charge the telephone call to a particular billing account by merely uttering the label identifying that account.
Abstract: A facility is provided for allowing a caller to place a telephone call by merely uttering a label identifying a desired called destination and to charge the telephone call to a particular billing account by merely uttering a label identifying that account. Alternatively, the caller may place the call by dialing or uttering the telephone number of the called destination or by entering a speed dial code associated with that telephone number. The facility includes a speaker verification system which employs cohort normalized scoring. Cohort normalized scoring provides a dynamic threshold for the verification process making the process more robust to variation in training and verification utterences. Such variation may be caused by, e.g., changes in communication channel characteristics or speaker loudness level.

126 citations