scispace - formally typeset
Search or ask a question
Patent

Method and apparatus for large population speaker identification in telephone interactions

19 Oct 2006-
TL;DR: In this paper, a method and apparatus for determining whether a speaker uttering an utterance belongs to a predetermined set comprising known speakers, wherein a training utterance is available for each known speaker.
Abstract: A method and apparatus for determining whether a speaker uttering an utterance belongs to a predetermined set comprising known speakers, wherein a training utterance is available for each known speaker. The method and apparatus test whether features extracted from the tested utterance provide a score exceeding a threshold when matched against one or more of models constructed upon voice samples of each known speaker. The method and system further provide optional enhancements such as determining, using, and updating model normalization parameters, a fast scoring algorithm, summed calls handling, or quality evaluation for the tested utterance.
Citations
More filters
Patent
06 Jan 2015
TL;DR: In this paper, a speech-to-text conversion of phone numbers spoken within a predetermined time of detecting data indicative of a three-way call event while monitoring a phone call from a prison inmate.
Abstract: In one aspect, the present invention facilitates the investigation of networks of criminals, by gathering associations between phone numbers, the names of persons reached at those phone numbers, and voice print data. In another aspect the invention automatically detects phone calls from a prison where the voiceprint of the person called matches the voiceprint of a past inmate. In another aspect the invention detects identity scams in prisons, by monitoring for known voice characteristics of likely imposters on phone calls made by prisoners. In another aspect, the invention automatically does speech-to-text conversion of phone numbers spoken within a predetermined time of detecting data indicative of a three-way call event while monitoring a phone call from a prison inmate. In another aspect, the invention automatically thwarts attempts of prison inmates to use re-dialing services. In another aspect, the invention automatically tags audio data retrieved from a database, by steganographically encoding into the audio data the identity of the official retrieving the audio data.

499 citations

Patent
25 Sep 2015
TL;DR: In this article, a secure telephone call management system is provided for authenticating users of a telephone system in an institutional facility, which is accomplished by using a personal identification number, biometric means, and/or radio frequency means.
Abstract: A secure telephone call management system is provided for authenticating users of a telephone system in an institutional facility. Authentication of the users of the telephone call management system is accomplished by using a personal identification number, biometric means, and/or radio frequency means. The secure telephone call management system includes accounting software capable of limiting access to the system based on funds in a user's account, and includes management software capable of implementing widespread or local changes to the system. The system monitors a conversation in the telephone call to detect a presence of a first characteristic in audio of the conversation, and terminates the telephone call if the first characteristic does not match a second characteristic of biometric information of a user or a called party.

204 citations

Patent
John F. Sheets1, Kim Wagner1
29 Jan 2014
TL;DR: In this article, the authors proposed a speaker verification system that allows the use of a captured voice sample attempting to reproduce a word string having a random element to authenticate the user, based on a match score indicating how closely the captured voice samples match to previously stored voice samples of the user and a pass or fail response indicating whether the voice sample is an accurate reproduction of the word string.
Abstract: Embodiments of the invention provide for speaker verification on a communication device without requiring a user to go through a formal registration process with the issuer or network. Certain embodiments allow the use of a captured voice sample attempting to reproduce a word string having a random element to authenticate the user. Authentication of the user is based on both a match score indicating how closely the captured voice samples match to previously stored voice samples of the user and a pass or fail response indicating whether the voice sample is an accurate reproduction of the word string. The processing network maintains a history of the authenticated transactions and voice samples.

99 citations

Patent
20 Mar 2012
TL;DR: In this paper, an AEFS is configured to perform vehicular threat detection based on information received at a road-based device, such as a sensor or processor that is deployed at the side of a road.
Abstract: Techniques for ability enhancement are described. Some embodiments provide an ability enhancement facilitator system (“AEFS”) configured to enhance a user's ability to operate or function in a transportation-related context as a pedestrian or a vehicle operator. In one embodiment, the AEFS is configured to perform vehicular threat detection based on information received at a road-based device, such as a sensor or processor that is deployed at the side of a road. An example AEFS receives, at a road-based device, information about a first vehicle that is proximate to the road-based device. The AEFS analyzes the received information to determine threat information, such as that the vehicle may collide with the user. The AEFS then informs the user of the determined threat information, such as by transmitting a warning to a wearable device configured to present the warning to the user.

84 citations

Patent
Mazin Gilbert1, Jay G. Wilpon1
29 May 2007
TL;DR: In this article, the authors present a system for comparing customer voice prints with a database of known fraudulent voice signatures and continually updating the database to decrease the risk of identity theft, which is based on comparing a received voice signal against a database.
Abstract: Disclosed are systems, methods, and computer readable media for comparing customer voice prints with a database of known fraudulent voice signatures and continually updating the database to decrease the risk of identity theft. The method embodiment comprises comparing a received voice signal against a database of known fraudulent voice signatures, denying the caller's transaction if the voice signal substantially matches the database of known fraudulent voice signatures, adding the caller's voice signal to the database of known fraudulent voice signatures if the voice signal does not substantially match a separate speaker verification database and received additional information is not verified.

82 citations

References
More filters
Journal ArticleDOI
TL;DR: The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.

4,673 citations

Journal ArticleDOI
TL;DR: The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are effective for modeling speaker identity and is shown to outperform the other speaker modeling techniques on an identical 16 speaker telephone speech task.
Abstract: This paper introduces and motivates the use of Gaussian mixture models (GMM) for robust text-independent speaker identification. The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are effective for modeling speaker identity. The focus of this work is on applications which require high identification rates using short utterance from unconstrained conversational speech and robustness to degradations produced by transmission over a telephone channel. A complete experimental evaluation of the Gaussian mixture speaker model is conducted on a 49 speaker, conversational telephone speech database. The experiments examine algorithmic issues (initialization, variance limiting, model order selection), spectral variability robustness techniques, large population performance, and comparisons to other speaker modeling techniques (uni-modal Gaussian, VQ codebook, tied Gaussian mixture, and radial basis functions). The Gaussian mixture speaker model attains 96.8% identification accuracy using 5 second clean speech utterances and 80.8% accuracy using 15 second telephone speech utterances with a 49 speaker population and is shown to outperform the other speaker modeling techniques on an identical 16 speaker telephone speech task. >

3,134 citations

Journal ArticleDOI
TL;DR: An introduction proposes a modular scheme of the training and test phases of a speaker verification system, and the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed.
Abstract: This paper presents an overview of a state-of-the-art text-independent speaker verification system. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed. Gaussian mixture modeling, which is the speaker modeling technique used in most systems, is then explained. A few speaker modeling alternatives, namely, neural networks and support vector machines, are mentioned. Normalization of scores is then explained, as this is a very important step to deal with real-world data. The evaluation of a speaker verification system is then detailed, and the detection error trade-off (DET) curve is explained. Several extensions of speaker verification are then enumerated, including speaker tracking and segmentation by speakers. Then, some applications of speaker verification are proposed, including on-site applications, remote applications, applications relative to structuring audio information, and games. Issues concerning the forensic area are then recalled, as we believe it is very important to inform people about the actual performance and limitations of speaker verification systems. This paper concludes by giving a few research trends in speaker verification for the next couple of years.

874 citations

Journal ArticleDOI
TL;DR: Four approaches for automatic language identification of speech utterances are compared: Gaussian mixture model (GMM) classification; single-language phone recognition followed by languaged dependent, interpolated n-gram language modeling (PRLM); parallel PRLM, which uses multiple single- language phone recognizers, each trained in a different language; and languagedependent parallel phone recognition (PPR).
Abstract: Abstruct- We have compared the performance of four approaches for automatic language identification of speech utterances: Gaussian mixture model (GMM) classification; single-language phone recognition followed by languagedependent, interpolated n-gram language modeling (PRLM); parallel PRLM, which uses multiple single-language phone recognizers, each trained in a different language; and languagedependent parallel phone recognition (PPR). These approaches, which span a wide range of training requirements and levels of recognition complexity, were evaluated with the Oregon Graduate Institute Multi-Language Telephone Speech Corpus. Systems containing phone recognizers performed better than the simpler GMM classifier. The top-performing system was parallel PRLM, which exhibited an error rate of 2% for 45-s utterances and 5% for 10-s utterances in two-language, closed-set, forcedchoice classification. The error rate for 11-language, closed-set, forced-choice classification was 11 % for 45-s utterances and 21% for 10-s utterances.

710 citations

Patent
18 Jul 2002
TL;DR: In this article, a method and apparatus for capturing and analyzing customer interactions is described, the apparatus comprising a multi-segment interaction capture device (324), an initial set up and calibration device (326), a pre-processing and context extraction device (328) and a rule-based analysis engine (300).
Abstract: A method and apparatus (100) for capturing and analyzing customer interactions, the apparatus comprising a multi-segment interaction capture device (324), an initial set up and calibration device (326), a pre-processing and context extraction device (328) and a rule-based analysis engine (300).

649 citations