Acoustic signature building for a speaker from multiple sessions

Patent

Acoustic signature building for a speaker from multiple sessions

Chats0

TLDR

In this paper, the authors present methods of diarizing audio data using first-pass blind diarization and second-passblind diarisation that generate speaker statistical models.

Abstract:

Disclosed herein are methods of diarizing audio data using first-pass blind diarization and second-pass blind diarization that generate speaker statistical models, wherein the first pass-blind diarization is on a per-frame basis and the second pass-blind diarization is on a per-word basis, and methods of creating acoustic signatures for a common speaker based only on the statistical models of the speakers in each audio session.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Fully Supervised Speaker Diarization

Aonan Zhang, +4 more

TL;DR: A fully supervised speaker diarization approach, named unbounded interleaved-state recurrent neural networks (UIS-RNN), given extracted speaker-discriminative embeddings, which decodes in an online fashion while most state-of-the-art systems rely on offline clustering.

...read moreread less

Patent

Language model speech endpointing

Bjorn Hoffmeister, +2 more

TL;DR: In this paper, an automatic speech recognition (ASR) system detects an endpoint of an utterance using the active hypotheses under consideration by a decoder, and the ASR system calculates the amount of non-speech detected by a plurality of hypotheses and weights the nonspeech duration by the probability of each hypotheses.

...read moreread less

Patent

Dimensionality reduction of baum-welch statistics for speaker recognition

Elie Khoury, +1 more

TL;DR: In this article, a deep neural network reduces a dimensionality of the normalized first order Gaussian mixture model statistics, and outputs a voiceprint corresponding to the recognition speech signal, which is used for speaker recognition.

...read moreread less

Patent

Direction-based speech endpointing

Jr. Charles Melvin Johnson

TL;DR: In this article, a system for determining an endpoint of an utterance during automatic speech recognition (ASR) processing that accounts for the direction and duration of the incoming speech is presented.

...read moreread less

Patent

Channel-Compensated Low-Level Features For Speaker Recognition

Elie Khoury, +1 more

TL;DR: In this article, a system for generating channel-compensated features of a speech signal includes a channel noise simulator that degrades the speech signal, a feed forward convolutional neural network (CNN), and a loss function that computes a difference between the channel compensated features and handcrafted features for the same raw speech signal.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Error bounds for convolutional codes and an asymptotically optimum decoding algorithm

Andrew J. Viterbi

- 01 Apr 1967 -

IEEE Transactions on Information Theory

TL;DR: The upper bound is obtained for a specific probabilistic nonsequential decoding algorithm which is shown to be asymptotically optimum for rates above R_{0} and whose performance bears certain similarities to that of sequential decoding algorithms.

...read moreread less

Journal ArticleDOI

A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains

Leonard E. Baum, +3 more

- 01 Feb 1970 -

Annals of Mathematical Statistics

Journal ArticleDOI

Perceptual linear predictive (PLP) analysis of speech

Hynek Hermansky

- 01 Apr 1990 -

Journal of the Acoustical Society of Ame...

TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.

...read moreread less

Book

Statistical Digital Signal Processing and Modeling

Monson H. Hayes

TL;DR: The main thrust is to provide students with a solid understanding of a number of important and related advanced topics in digital signal processing such as Wiener filters, power spectrum estimation, signal modeling and adaptive filtering.

...read moreread less

Patent

Secure data interchange

Frederick S. M. Herz, +4 more

TL;DR: A secure data interchange system enables information about bilateral and multilateral interactions between multiple persistent parties to be exchanged and leveraged within an environment that uses a combination of techniques to control access to information, release of information, and matching of information back to parties as mentioned in this paper.

...read moreread less

Collapse

Acoustic signature building for a speaker from multiple sessions

Citations

Fully Supervised Speaker Diarization

Language model speech endpointing

Dimensionality reduction of baum-welch statistics for speaker recognition

Direction-based speech endpointing

Channel-Compensated Low-Level Features For Speaker Recognition

References

Error bounds for convolutional codes and an asymptotically optimum decoding algorithm

A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains

Perceptual linear predictive (PLP) analysis of speech

Statistical Digital Signal Processing and Modeling

Secure data interchange

Related Papers (5)

Word-level blind diarization of recorded calls with arbitrary number of speakers

Blind Diarization of Recorded Calls With Arbitrary Number of Speakers

Voice Activity Detection Using A Soft Decision Mechanism

Method and system for speaker diarization

Multimodal Speaker Diarization Using a Pre-Trained Audio-Visual Synchronization Model