scispace - formally typeset
Open AccessProceedings ArticleDOI

Vector-quantization-based speech recognition and speaker recognition techniques

S. Furui
- pp 954-958
Reads0
Chats0
TLDR
It is concluded that not only has the VQ technique reduced the amount of computation and storage, but it has also created new ideas for solving various problems in speech/speaker recognition.
Abstract
The author reviews major methods of applying the vector quantization (VQ) technique to speech and speaker recognition. These include speech recognition based on the combination of VQ and the DTW/HMM (dynamic time warping/hidden Markov model) technique. VQ-distortion-based recognition, learning VQ algorithms, speaker adaptation by VQ-codebook mapping, and VQ-distortion-based speaker recognition. It is concluded that not only has the VQ technique reduced the amount of computation and storage, but it has also created new ideas for solving various problems in speech/speaker recognition. >

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Real-time speaker identification.

TL;DR: The number of test vectors is reduced by pre-quantizing the test sequence prior to matching, and the number of speakers are reduced by pruning out unlikely speakers during the identification process by optimizing vector quantization (VQ) based speaker identification.

Quranic Verse Recitation Feature Extraction using Mel-Frequency Cepstral Coefficient (MFCC)

TL;DR: This paper explores the viability of Mel-Frequency Cepstral Coefficient (MFCC) technique to extract features from Quranic verse recitation, one of the most popular feature extraction techniques used in speech recognition.

Toward Constructing A Multilingual Speech Corpus for Taiwanese (Min-nan), Hakka, and Mandarin

TL;DR: The Formosa speech database (ForSDat) is a multilingual speech corpus collected at Chang Gung University and sponsored by the National Science Council of Taiwan and the first version of this corpus containing speech of 600 speakers of Taiwanese and Mandarin was finished and is ready to be released.

Multiband Approach to Robust Text-Independent Speaker Identification

TL;DR: Experimental results show that both proposed methods achieve better performance than GMM using full-band LPCCs and mel-frequency cepstral coefficients (MFCCs) when the speaker identification is evaluated in the presence of clean and noisy environments.
Book ChapterDOI

Learning Intrinsic Video Content Using Levenshtein Distance in Graph Partitioning

TL;DR: The graph partitioning method is extended and in particular, the Normalised Cut model originally introduced for static image segmentation is extended to unsupervised clustering of temporal trajectories withfully automated model order selection.
References
More filters
Book

Self Organization And Associative Memory

Teuvo Kohonen
TL;DR: The purpose and nature of Biological Memory, as well as some of the aspects of Memory Aspects, are explained.
Journal ArticleDOI

Hidden Markov models for speech recognition

TL;DR: The role of statistical methods in this powerful technology as applied to speech recognition is addressed and a range of theoretical and practical issues that are as yet unsolved in terms of their importance and their effect on performance for different system implementations are discussed.
Book

Hidden Markov Models for Speech Recognition

TL;DR: In this article, the authors unified theory with semi-continuous models using hidden Markov models for speech recognition experimental examples, using vector quantization and mixture densities hidden markov models.
Proceedings ArticleDOI

Statistical pattern recognition with neural networks: benchmarking studies

TL;DR: Three basic types of neural-like networks, backpropagation network, Boltzmann machine, and learning vector quantization, were applied to two representative artificial statistical pattern recognition tasks, each with varying dimensionality.
Related Papers (5)