Showing papers on "Dynamic time warping published in 1993"

PDF

Open Access

Proceedings Article•DOI•

[...]

Trevor Darrell¹, Alex Pentland¹•Institutions (1)

15 Jun 1993

TL;DR: A method for learning, tracking, and recognizing human gestures using a view-based approach to model articulated objects is presented and results showing tracking and recognition of human hand gestures at over 10 Hz are presented.

...read moreread less

Abstract: A method for learning, tracking, and recognizing human gestures using a view-based approach to model articulated objects is presented. Objects are represented using sets of view models, rather than single templates. Stereotypical space-time patterns, i.e., gestures, are then matched to stored gesture patterns using dynamic time warping. Real-time performance is achieved by using special purpose correlation hardware and view prediction to prune as much of the search space as possible. Both view models and view predictions are learned from examples. Results showing tracking and recognition of human hand gestures at over 10 Hz are presented. >

...read moreread less

425 citations

Journal Article•DOI•

Discriminative training of dynamic programming based speech recognizers

[...]

P.C. Chang, Biing-Hwang Juang

01 Apr 1993-IEEE Transactions on Speech and Audio Processing

TL;DR: A new minimum recognition error formulation and a generalized probabilistic descent (GPD) algorithm are analyzed and used to accomplish discriminative training of a conventional dynamic-programming-based speech recognizer.

...read moreread less

Abstract: A new minimum recognition error formulation and a generalized probabilistic descent (GPD) algorithm are analyzed and used to accomplish discriminative training of a conventional dynamic-programming-based speech recognizer. The objective of discriminative training here is to directly minimize the recognition error rate. To achieve this, a formulation that allows controlled approximation of the exact error rate and renders optimization possible is used. The GPD method is implemented in a dynamic-time-warping (DTW)-based system. A linear discriminant function on the DTW distortion sequence is used to replace the conventional average DTW path distance. A series of speaker-independent recognition experiments using the highly confusible English E-set as the vocabulary showed a recognition rate of 84.4% compared to approximately 60% for traditional template training via clustering. The experimental results verified that the algorithm converges to a solution that achieves minimum error rate. >

...read moreread less

165 citations

Proceedings Article•DOI•

A segment-based speech recognition system for isolated Mandarin syllables

[...]

Saga Chang¹, Sin-Horng Chen¹•Institutions (1)

National Chiao Tung University¹

19 Oct 1993

TL;DR: A segment-based speech recognition scheme is proposed to explicitly model the correlations between successive frames of an acoustic segment by using features representing the contours of spectral parameters by using several lower-order coefficients of discrete orthonormal polynomial expansions.

...read moreread less

Abstract: A segment-based speech recognition scheme is proposed The basic idea is to explicitly model the correlations between successive frames of an acoustic segment by using features representing the contours of spectral parameters These segmental features are several lower-order coefficients of discrete orthonormal polynomial expansions The performance of the proposed scheme was examined by simulations on multi-speaker speech recognition for all 408 highly confusing first-tone Mandarin syllables A recognition rate of 774% was achieved for the case, using five 6-segment reference templates per syllable This is 130% and 66% higher than those obtained by a conventional dynamic time warping (DTW) method and a conventional hidden Markov model (CHMM) method, respectively >

...read moreread less

159 citations

Proceedings Article•DOI•

Improving connected letter recognition by lipreading

[...]

Christoph Bregler¹, Hermann Hild², S. Manke¹, Alex Waibel²•Institutions (2)

Karlsruhe Institute of Technology¹, Carnegie Mellon University²

27 Apr 1993

TL;DR: The authors show how recognition performance in automated speech perception can be significantly improved by additional lipreading, so called speech-reading, on an extension of a state-of-the-art speech recognition system, a modular multistage time delay neural network architecture (MS-TDNN).

...read moreread less

Abstract: The authors show how recognition performance in automated speech perception can be significantly improved by additional lipreading, so called speech-reading. They show this on an extension of a state-of-the-art speech recognition system, a modular multistage time delay neural network architecture (MS-TDNN). The acoustic and visual speech data are preclassified in two separate front-end phoneme TDNNs and combined with acoustic-visual hypotheses for the dynamic time warping algorithm. This is shown on a connected word recognition problem, the notoriously difficult letter spelling task. With speech-reading, the error rate could be reduced by up to half of the error rate of the pure acoustic recognition. >

...read moreread less

145 citations

Book Chapter•DOI•

Continuous automatic speech recognition by lipreading

[...]

Alan Jeffrey Goldschen¹•Institutions (1)

Mitre Corporation¹

01 Jan 1993

TL;DR: While this study focuses on the feasibility, validity, and segregated contribution of exclusively continuous OASR, future highly robust recognition systems should combine optical and acoustic information with syntactic, semantic and pragmatic aids.

...read moreread less

Abstract: This study describes the design and implementation of a novel continuous speech recognizer that uses optical information from the oral-cavity shadow of a speaker. The system uses hidden Markov models (HMMs) trained to discriminate optical information and achieves a recognition rate of 25.3 percent on 150 test sentences. This is the first system to accomplish continuous optical automatic speech recognition (OASR). This level of performance--without the use of syntactical, semantic, or any other contextual guide to the recognition process--indicates that OASR may be used as a major supplement for robust multi-modal recognition in noisy environments. Additionally, new features important for OASR were discovered, and novel approaches to vector quantization, training, and clustering were utilized. This study contains three major components. First, it hypothesize 35 static and dynamic optical features to characterize the shadow of the oral-cavity for the speaker. Using the corresponding correlation matrix and a principal component analysis, the study discarded 22 oral-cavity features. The remaining 13 oral-cavity features are mostly dynamic features, unlike the static features used by previous researchers. Second, the study merged phonemes that appear optically similar on the speaker's oral-cavity region into visemes. The visemes were objectively analyzed and discriminated using HMM and clustering algorithms. Most significantly, the visemes for the speaker, obtained through computation, are consistent with the phoneme-to-viseme mapping discussed by most lipreading experts. This similarity, in a sense, verifies the selection of oral-cavity features. Third, the study trained the HMMs to recognize, without a grammar, a set of sentences having a perplexity of 150, using visemes, trisemes (triplets of visemes), and generalized trisemes (clustered trisemes). The system achieved recognition rates of 2 percent, 12.7 percent, and 25.3 percent using, respectively, viseme HMMs, triseme HMMs, and generalized triseme HMMs. The study concludes that methodologies used in this investigation demonstrate the need for further research on continuous OASR and on the integration of optical information with other recognition methods. While this study focuses on the feasibility, validity, and segregated contribution of exclusively continuous OASR, future highly robust recognition systems should combine optical and acoustic information with syntactic, semantic and pragmatic aids.

...read moreread less

94 citations

Patent•

System and method of pattern recognition employing a multiprocessing pipelined apparatus with private pattern memory

[...]

Delbert D. Bailey¹, Carole Dulong¹•Institutions (1)

Intel¹

12 May 1993

TL;DR: In this paper, a pattern recognition engine is provided within the present invention that contains five pipelines which operate in parallel and are specially optimized for Dynamic Time Warping and Hidden Markov Models procedures for pattern recognition, especially handwriting recognition.

...read moreread less

Abstract: A computer implemented apparatus and method of pattern recognition utilizing a pattern recognition engine coupled with a general purpose computer system. The present invention system provides increased accuracy and performance in handwriting and voice recognition systems and may interface with general purpose computer systems. A pattern recognition engine is provided within the present invention that contains five pipelines which operate in parallel and are specially optimized for Dynamic Time Warping and Hidden Markov Models procedures for pattern recognition, especially handwriting recognition. These pipelines comprise two arithmetic pipelines, one control pipeline and two pointer pipelines. Further, a private memory is associated with each pattern recognition engine for library storage of reference or prototype patterns. Recognition procedures are partitioned across a CPU and the pattern recognition engine. Use of a private memory allows quick access of the library patterns without impeding the performance of programs operating on the main CPU or the host bus. Communication between the CPU and the pattern recognition engine is accomplished over the host bus.

...read moreread less

62 citations

Journal Article•DOI•

Evaluation and optimization of perceptually-based ASR front-end

[...]

J.-C. Junqua, H. Wakita¹, Hynek Hermansky¹•Institutions (1)

Panasonic¹

01 Jan 1993-IEEE Transactions on Speech and Audio Processing

TL;DR: It is experimentally shown that one can optimize the system and further improve recognition accuracy for speaker-independent recognition by controlling the distance measure's sensitivity to spectral peaks and the spectral tilt and by utilizing the speech dynamic features.

...read moreread less

Abstract: Several recently proposed automatic speech recognition (ASR) front-ends are experimentally compared in speaker-dependent, speaker-independent (or cross-speaker) recognition. The perceptually based linear predictive (PLP) front-end, with the root-power sums (RPS) distance measure, yields generally the highest accuracies, especially in cross-speaker recognition., It is experimentally shown that one can optimize the system and further improve recognition accuracy for speaker-independent recognition by controlling the distance measure's sensitivity to spectral peaks and the spectral tilt and by utilizing the speech dynamic features. For a digit vocabulary and five reference templates obtained with a clustering algorithm, the optimization improves recognition accuracy from 97% to 98.1%, with respect to the PL-PRPS front-end. >

...read moreread less

31 citations

Proceedings Article•DOI•

Bimodal sensor integration on the example of 'speechreading'

[...]

Christoph Bregler¹, S. Manke¹, Hermann Hild, Alex Waibel•Institutions (1)

Karlsruhe Institute of Technology¹

28 Mar 1993

TL;DR: It is shown how recognition performance in automated speech preception can be significantly improved by additional lipreading, so called speech-reading, on an extension of an existing state-of-the-art speech recognition system, a modular multi-state time-delay neural network (MS-TDNN).

...read moreread less

Abstract: It is shown how recognition performance in automated speech preception can be significantly improved by additional lipreading, so called speech-reading. It is shown on an extension of an existing state-of-the-art speech recognition system, a modular multi-state time-delay neural network (MS-TDNN). The acoustic and visual speech data are preclassified in two separate front-end phoneme TDNNs and combined to acoustic-visual hypotheses for the dynamic time warping algorithm. This is shown on a connected word recognition problem, the letter spelling task. With speech-reading the error rate can be reduced up to half of the error rate of pure acoustic recognition. >

...read moreread less

26 citations

Proceedings Article•

Classifying Hand Gestures with a View-Based Distributed Representation

[...]

Trevor Darrell¹, Alex Pentland¹•Institutions (1)

Massachusetts Institute of Technology¹

29 Nov 1993

TL;DR: A view-based representation is used to model aspects of the hand relevant to the trained gestures, and is found using an unsupervised clustering technique, which uses normalized correlation networks, with dynamic time warping in the temporal domain, as a distance function for unsuper supervised clustering.

...read moreread less

Abstract: We present a method for learning, tracking, and recognizing human hand gestures recorded by a conventional CCD camera without any special gloves or other sensors. A view-based representation is used to model aspects of the hand relevant to the trained gestures, and is found using an unsupervised clustering technique. We use normalized correlation networks, with dynamic time warping in the temporal domain, as a distance function for unsupervised clustering. Views are computed separably for space and time dimensions; the distributed response of the combination of these units characterizes the input data with a low dimensional representation. A supervised classification stage uses labeled outputs of the spatio-temporal units as training data. Our system can correctly classify gestures in real time with a low-cost image processing accelerator.

...read moreread less

24 citations

Proceedings Article•DOI•

A discriminative neural prediction system for speech recognition

[...]

Abdelhamid Mellouk¹, Patrick Gallinari²•Institutions (2)

Centre national de la recherche scientifique¹, University of Paris²

27 Apr 1993

TL;DR: The authors propose a continuous speaker independent speech recognition system based on predictive neural networks for modelizing phonemes, and dynamic time warping for temporal alignment that compares well with current systems.

...read moreread less

Abstract: The authors propose a continuous speaker independent speech recognition system based on predictive neural networks for modelizing phonemes, and dynamic time warping for temporal alignment. In this system several modules cooperate, and this allows incorporation of a grammar model and simple correction rules. The neural networks are trained by using a frame discriminative criterion. Tests on the TIMIT database show 74.5% correct classification and 68.6% accuracy, which compares well with current systems (the CMU SPHINX System and the Cambridge Recurrent Error Propagation network). >

...read moreread less

22 citations

Proceedings Article•DOI•

Connectionist architectural learning for high performance character and speech recognition

[...]

U. Bodenhausen¹, S. Manke²•Institutions (2)

Karlsruhe Institute of Technology¹, Carnegie Mellon University²

27 Apr 1993

TL;DR: The authors applied an automatic structure optimization (ASO) algorithm to the optimization of multistate time-delay neural networks (MSTDNNs), an extension of the TDNN, which was applied successfully to speech recognition and handwritten character recognition tasks with varying amounts of training data.

...read moreread less

Abstract: The authors applied an automatic structure optimization (ASO) algorithm to the optimization of multistate time-delay neural networks (MSTDNNs), an extension of the TDNN. These networks allow the recognition of sequences of ordered events that have to be observed jointly. For example, in many speech recognition systems the recognition of words is decomposed into the recognition of sequences of phonemes or phonemelike units. In handwritten character recognition the recognition of characters can be decomposed into the joined recognition of characteristic strokes, etc. The combination of the proposed ASO algorithm with the MSTDNN was applied successfully to speech recognition and handwritten character recognition tasks with varying amounts of training data. >

...read moreread less

Journal Article•DOI•

Continuous hidden Markov models integrating transitional and instantaneous features for Mandarin syllable recognition

[...]

Yumin Lee¹, Lin-Shan Lee¹•Institutions (1)

National Taiwan University¹

01 Jul 1993-Computer Speech & Language

TL;DR: The performance of continuous HMMs using one type of transitional features in speaker-dependent recognition of the highly confusing Mandarin syllables is first evaluated and discussed in detail under the constraint of very limited training data.

...read moreread less

Journal Article•DOI•

Enhancements to DTW and VQ decision algorithms for speaker recognition

[...]

Ian Michael Booth¹, Michael Barlow¹, Brett Watson¹•Institutions (1)

University of Queensland¹

01 Dec 1993-Speech Communication

TL;DR: Results are presented which show that the additional parameters extracted encode further speaker specific information, and can be used to improve upon the speaker verification performance of the baseline systems.

...read moreread less

Journal Article•DOI•

On the use of a family of signal limiters for recognition of noisy speech

[...]

Chin-Hui Lee¹, Chih-Heng Lin¹•Institutions (1)

AT&T¹

01 Aug 1993-Speech Communication

TL;DR: Testing on a 39-word English alpha-digit vocabulary, in a speaker trained mode, indicates that the recognition performance of a template-based, dynamic time-warping (DTW) recognizer can be significantly improved in noisy conditions when the robust signal limiter is used as a pre-processor to reduce the variability of the features in strong mismatch conditions.

...read moreread less

Proceedings Article•DOI•

Dynamic time warping comb filter for the enhancement of speech degraded by white Gaussian noise

[...]

J.T. Graf¹, N. Hubing¹•Institutions (1)

University of Missouri¹

27 Apr 1993

TL;DR: It is shown that the dynamic time warping (DTW) comb filter corrects for variations in the vocal tract as well as the variation in pitch by using DTW.

...read moreread less

Abstract: An attempt is made to enhance speech degraded by added noise by exploiting the periodic nature of voiced speech. A modification of the adaptive comb filter is employed for this purpose. Problems which may arise when using the periodicity of the speech for enhancement include significant distortion caused by comb filtering a time-varying waveform (called temporal smearing) as well as the variation in pitch from period to period (called overload). It is shown that the dynamic time warping (DTW) comb filter corrects for variations in the vocal tract as well as the variation in pitch by using DTW. A computationally straightforward but suboptimal implementation of the time warping algorithm is used to improve the performance of the comb filter algorithm. Performance is based on computational complexity, informal listening tests, and segmental SNR. >

...read moreread less

Proceedings Article•DOI•

Prototype-based MCE/GPD training for word spotting and connected word recognition

[...]

Erik McDermott, Shigeru Katagiri

27 Apr 1993

TL;DR: A novel MCE/GPD (minimum classification error/generalized probabilistic descent) loss function that can incorporate word spotting errors and other measures of symbolic distance between correct and incorrect categories is defined.

...read moreread less

Abstract: A straightforward application of PBMEC (prototype-based minimum error classifier) training to existing techniques for handling continuous speech is described. A novel MCE/GPD (minimum classification error/generalized probabilistic descent) loss function that can incorporate word spotting errors and other measures of symbolic distance between correct and incorrect categories is defined. Classification consists in a time-synchronous DTW (dynamic time warping) pass through a finite state machine; adaptation makes use of an A* based N-best algorithm and consists in propagating the derivative of the loss over the N best paths through the finite state machine. The key feature is that the loss function being optimized closely reflects the actual recognition performance of the system. >

...read moreread less

Book Chapter•DOI•

Automatically Structured Neural Networks For Handwritten Character And Word Recognition

[...]

U. Bodenhausen¹, S. Manke¹•Institutions (1)

Karlsruhe Institute of Technology¹

13 Sep 1993

TL;DR: It is shown that MSTDNNs are a very powerful approach to on-line handwritten character and word recognition and that the ASO algorithm can automatically structure this type of architecture efficiently in a single training run.

...read moreread less

Abstract: Highly structured neural networks like the Time-Delay Neural Network (TDNN) can achieve very high recognition accuracies in real world applications like on-line handwritten character and speech recognition systems. Achieving the best possible performance greatly depends on the optimization of all structural parameters for the given task and amount of training data. We propose an Automatic Structure Optimization (ASO) algorithm that avoids time-consuming manual optimization and apply it to Multi State Time-Delay Neural Networks (MSTDNNs), a recent extension of the TDNN. We show that MSTDNNs are a very powerful approach to on-line handwritten character and word recognition and that the ASO algorithm can automatically structure this type of architecture efficiently in a single training run.

...read moreread less

Proceedings Article•DOI•

Speech recognition based on Kohonen self-organizing feature maps and hybrid connectionist systems

[...]

Nikola Kasabov¹, Daniel Nikovski, E. Peev•Institutions (1)

University of Otago¹

24 Nov 1993

TL;DR: The hybrid system developed by the authors combines self-organizing feature maps with dynamic time warping and the combination has better performance than either of the two methods applied individually.

...read moreread less

Abstract: Describes a series of experiments on using Kohonen self-organizing maps and hybrid systems for continuous speech recognition. Experiments with different nonlinear transformations on the signal before using a neural network has been done and results compared. The hybrid system developed by the authors combines self-organizing feature maps with dynamic time warping. The experiments suggest that the combination has better performance than either of the two methods applied individually. >

...read moreread less

Journal Article•DOI•

Time-warping network: a neural approach to hidden Markov model based speech recognition

[...]

Esther Levin¹, Roberto Pieraccini¹, Enrico Bocchieri¹•Institutions (1)

Bell Labs¹

01 Aug 1993-International Journal of Pattern Recognition and Artificial Intelligence

TL;DR: The time-warping network (TWN) is introduced that is a generalization of both an HMM-based recognizer and a backpropagation net, and its results indicate that not only does the recognition performance improve, but the separation between classes is enhanced, allowing to set up a rejection criterion to improve the confidence of the system.

...read moreread less

Abstract: Recently, much interest has been generated regarding speech recognition systems based on Hidden Markov Models (HMMs) and neural network (NN) hybrids. Such systems attempt to combine the best features of both models: the temporal structure of HMMs and the discriminative power of neural networks. In this work we establish one more relation between the HMM and the NN paradigms by introducing the time-warping network (TWN) that is a generalization of both an HMM-based recognizer and a backpropagation net. The basic element of such a network, a time- warping neuron, extends the operation of the formal neuron of a backpropagation network by warping the input pattern to match it optimally to its weights. We show that a single-layer network of TW neurons is equivalent to a Gaussian density HMM-based recognition system. This equivalent neural representation suggests ways to improve the discriminative power of this system by using backpropagation discriminative training, and/or by generalizing the structure of the recognizer to a multi-layer net. The performance of the proposed network was evaluated on a highly confusable, isolated word, multi-speaker recognition task. The results indicate that not only does the recognition performance improve, but the separation between classes is enhanced, allowing us to set up a rejection criterion to improve the confidence of the system.

...read moreread less

Proceedings Article•DOI•

Robust classification of variable-length sonar sequences

[...]

Joydeep Ghosh¹, Narsimham V. Gangishetti¹, Srinivasa V. Chakravarthy¹•Institutions (1)

University of Texas at Austin¹

02 Sep 1993-Proceedings of SPIE

TL;DR: The nonlinear behavior of ASTER provides more robust performance than the related dynamic time warping algorithm and is compared with a more common approach wherein a self-organizing feature map is first used to map a sequence of extracted feature vectors onto a lower dimensional trajectory.

...read moreread less

Abstract: Two types of artificial neural networks are introduced for the robust classification of spatio- temporal sequences. The first network is the Adaptive Spatio-Temporal Recognizer (ASTER), which adaptively estimates the confidence that a (variable length) signal of a known class is present by continuously monitoring a sequence of feature vectors. If the confidence for any class exceeds a threshold value at some moment, the signal is considered to be detected and classified. The nonlinear behavior of ASTER provides more robust performance than the related dynamic time warping algorithm. ASTER is compared with a more common approach wherein a self-organizing feature map is first used to map a sequence of extracted feature vectors onto a lower dimensional trajectory, which is then identified using a variant of the feedforward time delay neural network. The performance of these two networks is compared using artificial sonograms as well as feature vectors strings obtained from short-duration oceanic signals.© (1993) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

...read moreread less

Book Chapter•DOI•

Continuous Speech Recognition Predictive Systems

[...]

A. Mellouk¹, Patrick Gallinari²•Institutions (2)

University of Paris-Sud¹, University of Paris²

13 Sep 1993

TL;DR: A discriminative neural prediction system for continuous speaker independent speech recognition that allows to reach 74,9% accuracy on TIMIT which compares well with other state of the art systems, while being less complex and easier to implement.

...read moreread less

Abstract: This paper presents a discriminative neural prediction system for continuous speaker independent speech recognition. We first compare different neural predictors for modeling speech production. We then propose new criteria for discriminative training. These networks are incorporated into a complete speech recognition system where they cooperate with other modules (grammar model, correction rules and dynamic time warping). Our best systems allow to reach 74,9% accuracy on TIMIT which compares well with other state of the art systems, while being less complex and easier to implement.

...read moreread less

Journal Article•DOI•

A chain VQ clustering algorithm for realtime speech recognition

[...]

Xixian Chen¹, Changnian Cai¹•Institutions (1)

Beijing University of Posts and Telecommunications¹

01 Mar 1993-Pattern Recognition Letters

TL;DR: A chain vector-quantization clustering clustering (CVQC) algorithm for realtime speech recognition that delivers faster training and recognition speeds and requires smaller memory locations.

...read moreread less

Proceedings Article•DOI•

Perception of speech signals using self-organization on linear neuron array

[...]

Cheng-Yuan Liou¹, Chwan-Yi Shiah¹•Institutions (1)

National Taiwan University¹

25 Oct 1993

TL;DR: A continuous speech recognition system with finite set of Chinese words is devised, and the precedence relations among the spectral patterns within a token period can be preserved by the topology preservations and the serious nonlinear time warping can be overcome.

...read moreread less

Abstract: A continuous speech recognition system with finite set of Chinese words is devised for selected applications. With proper design of the self-organizing map for the speech signals, the precedence relations among the spectral patterns within a token period can be preserved by the topology preservations and the serious nonlinear time warping can thus be overcome. The 1D hierarchical relations among the sequential spectral patterns can be represented by the topology map developed on the linear array of neurons. We then devise two kinds of perception energies based on the trained map. One of the energies is derived from properly fitting a precedence curve on the sequential excitation patterns of the map during a whole word period. The other energy is obtained from the accumulation of total excitations on the map during a word period. Thresholds for the perception energies are then designed experimentally. A set of 1309 linear array maps are used for representing the total 1309 standard Chinese word pronunciations. Each linear array contains 100 equally spaced and linearly ordered neurons.

...read moreread less

Journal Article•DOI•

SIMD-systolic architecture and VLSI chip for computing the dynamic time-warping algorithm

[...]

Chen-Mie Wu, Maw-Shiun Yan

01 Oct 1993-International Journal of Electronics

TL;DR: A VLSI architecture, which exhibits both SIMD and systolic behaviour for computing the dynamic time-warping (DTW) algorithm is presented, and a 20000-word real-time DTW-based speech recognition system is achievable.

...read moreread less

Abstract: A VLSI architecture, which exhibits both SIMD and systolic behaviour for computing the dynamic time-warping (DTW) algorithm is presented. Such an architecture is well-suited for VLSI implementation because of its regular structure and small number of input/output. Currently, based on a 1-2 µm CMOS technology, a SIMD-systolic data-path chip has been designed and fabricated for computing the DTW algorithm. It is functionally correct and packaged as a 68-pin PGA chip. With such a chip, a 20000-word real-time DTW-based speech recognition system is achievable.

...read moreread less

Proceedings Article•DOI•

Implementing dynamic programming algorithms for signal and image processing on array processors

[...]

W.H. Chou¹, Konstantinos I. Diamantaras¹, S.Y. Kung•Institutions (1)

Princeton University¹

20 Oct 1993

TL;DR: The authors present the implementation of a generic dynamic programming algorithm on array processors, adopting a torus interconnection network, an internal/external dual buffer structure, and a multilevel pipelining design, for a performance of several GOPS per DP chip.

...read moreread less

Abstract: The authors present the implementation of a generic dynamic programming algorithm on array processors. A dynamic programming (DP) chip is proposed to speed up the processing of the dynamic programming tasks in many applications, including the Viterbi algorithm, the boundary following algorithm, the dynamic time warping algorithm, etc. By adopting a torus interconnection network, an internal/external dual buffer structure, and a multilevel pipelining design, a performance of several GOPS per DP chip is expected. Both the dedicated hardware design and the data low control of the DP chip are discussed. >

...read moreread less

Proceedings Article•

Neural time warping.

[...]

Bruno Apolloni, Dario Crivelli, Marco Amato

01 Jan 1993

Proceedings Article•DOI•

Performance measures for ERP signal extraction methods

[...]

S. Tajwar¹, S.G. Mason¹, Gary E. Birch¹•Institutions (1)

University of British Columbia¹

14 Sep 1993

TL;DR: The authors' experience to date leads them to recommend the use of a combination of a shift-tolerant, correlation-based measure, such as DTW, and a robust Normalized Mean Square Error measure.

...read moreread less

Abstract: A critical problem encountered in evaluating methods that extract event-related potentials (ERPs) from single-trial electroencephalograph (EEG) signals is the inadequacy of available performance measures. Here the authors analyzed two standard performance measures, Normalized Mean Squared Error and correlation, and a lesser used measure, dynamic time warping (DTW), and explored the conditions under which they provide misleading results. The authors' experience to date leads them to recommend the use of a combination of a shift-tolerant, correlation-based measure, such as DTW, and a robust Normalized Mean Square Error measure. >

...read moreread less

Proceedings Article•DOI•

Speaker-independent features extracted by a neural network

[...]

Y. Kato, M. Sugiyama

27 Apr 1993

TL;DR: Experimental results show that a neural network can be used as a new speaker-independent feature extractor and is compared with a conventional training algorithm in terms of recognition performance.

...read moreread less

Abstract: The authors propose an algorithm using a neural network to normalize features that differ between speakers in speaker-independent speech recognition. The algorithm has three procedures: (1) initially training a neural network, (2) calculating the alignment function between the target signal and the network's output by dynamic time warping, and (3) incrementally training the network for extracting speaker-independent features. The neural network is a fuzzy partition model (FPM) with multiple input-output units to give a probabilistic formulation. The algorithm was evaluated in phrase recognition experiments by FPM-LR recognizers. The FPM was directly combined with a LR parser. The algorithm is compared with a conventional training algorithm in terms of recognition performance. The experimental results show that a neural network can be used as a new speaker-independent feature extractor. >

...read moreread less

Word hypothesis from undifferentiated, errorful phonetic strings

[...]

R. Thomas Sellman¹•Institutions (1)

Rochester Institute of Technology¹

01 Jan 1993

TL;DR: This thesis investigates a dynamic programming approach to word hypothesis in the context of a speaker independent, large vocabulary, continuous speech recognition system, and attempts to extend the DTW technique using strings of phonetic symbols.

...read moreread less

Abstract: This thesis investigates a dynamic programming approach to word hypothesis in the context of a speaker independent, large vocabulary, continuous speech recognition system. Using a method known as Dynamic Time Warping, an undifferentiated phonetic string (one without word boundaries) is parsed to produce all possible words contained in a domain specific lexicon. Dynamic Time Warping is a common method of sequence comparison used in matching the acoustic feature vectors representing an unknown input utterance and some reference utterance. The cumulative least cost path, when compared with some threshold can be used as a decision criterion for recognition. This thesis attempts to extend the DTW technique using strings of phonetic symbols, instead. Three variables that were found to affect the parsing process include: (1) minimum distance thres hold, (2) the number of word candidates accepted at any given phonetic index, and (3) the lexical search space used for reference pattern comparisons. The performance of this parser as a function of these variables is discussed. Also discussed is the performance of the parser at a variety of input error

...read moreread less

Journal Article•DOI•

Speech signal processing using optical method

[...]

Keikichi Hirose¹•Institutions (1)

University of Tokyo¹

01 Oct 1993-Speech Communication

TL;DR: In this paper, an optical processor consisting of a Helium-Neon laser, optical lenses, photographic film plates and diffusers was used for the analysis and recognition of speech signals.

...read moreread less