scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic time warping published in 1994"


Proceedings Article
31 Jul 1994
TL;DR: Preliminary experiments with a dynamic programming approach to pattern detection in databases, based on the dynamic time warping technique used in the speech recognition field, are described.
Abstract: Knowledge discovery in databases presents many interesting challenges within the context of providing computer tools for exploring large data archives. Electronic data repositories are growing quickly and contain data from commercial, scientific, and other domains. Much of this data is inherently temporal, such as stock prices or NASA telemetry data. Detecting patterns in such data streams or time series is an important knowledge discovery task. This paper describes some preliminary experiments with a dynamic programming approach to the problem. The pattern detection algorithm is based on the dynamic time warping technique used in the speech recognition field.

3,229 citations


01 Jan 1994
TL;DR: In this paper, a dynamic time warping technique used in the speech recognition field is used to detect patterns in data streams or time series, such as stock prices or NASA telemetry data.
Abstract: Knowledge discovery in databases presents many interesting challenges within the ¢onte~t of providing computer tools for exploring large data archives. Electronic data .repositories are growing qulckiy and contain data from commercial, scientific, and other domains. Much of this data is inherently temporal, such as stock prices or NASA telemetry data. Detect£ug patterns in such data streams or time series is an important knowledge discovery task. This paper describes some pr~|~m;~,ry experiments with a dynamic prograrnm~,~g approach to the problem. The pattern detection algorithm is based on the dynamic time warping technique used in the speech recognition field.

161 citations


Posted Content
TL;DR: The authors proposed a new algorithm called DK-vec for aligning pairs of Asian/Indo-European noisy parallel texts without sentence boundaries, which uses frequency, position and recency information as features for pattern matching.
Abstract: We propose a new algorithm called DK-vec for aligning pairs of Asian/Indo-European noisy parallel texts without sentence boundaries. DK-vec improves on previous alignment algorithms in that it handles better the non-linear nature of noisy corpora. The algorithm uses frequency, position and recency information as features for pattern matching. Dynamic Time Warping is used as the matching technique between word pairs. This algorithm produces a small bilingual lexicon which provides anchor points for alignment.

81 citations


Journal ArticleDOI
TL;DR: This work extends LVQ into a prototype-based minimum error classifier appropriate for the classification of various speech units which the original LVQ was unable to treat, and discusses the issue of smoothing the loss function from the perspective of increasing classifier robustness.

52 citations


Proceedings Article
05 Oct 1994
TL;DR: A new algorithm called DK-vec is proposed for aligning pairs of Asian/Indo-European noisy parallel texts without sentence boundaries that handles better the non-linear nature of noisy corpora.
Abstract: We propose a new algorithm, DK-vec, for aligning pairs of Asian/Indo-European noisy parallel texts without sentence boundaries. The algorithm uses frequency, position and recency information as features for pattern matching. Dynamic Time Warping is used as the matching technique between word pairs. This algorithm produces a small bilingual lexicon which provides anchor points for alignment.

42 citations


Book ChapterDOI
01 Jan 1994
TL;DR: The need to assure that only the right people are authorized to high-security accesses has led to develop systems for automatic personal verification.
Abstract: The need to assure that only the right people are authorized to high-security accesses has led to develop systems for automatic personal verification.

39 citations


Proceedings ArticleDOI
19 Apr 1994
TL;DR: The MS-TDNN integrates the high accuracy single character recognition capabilities of a TDNN with a non-linear time alignment procedure (dynamic time warping algorithm) for finding stroke and character boundaries in isolated, handwritten characters and words.
Abstract: Shows how the multi-state time delay neural network (MS-TDNN), which is already used successfully in continuous speech recognition tasks, can be applied both to online single character and cursive (continuous) handwriting recognition. The MS-TDNN integrates the high accuracy single character recognition capabilities of a TDNN with a non-linear time alignment procedure (dynamic time warping algorithm) for finding stroke and character boundaries in isolated, handwritten characters and words. In this approach each character is modelled by up to 3 different states and words are represented as a sequence of these characters. The authors describe the basic MS-TDNN architecture and the input features used in the paper, and present results (up to 97.7% word recognition rate) both on writer dependent/independent, single character recognition tasks and writer dependent, cursive handwriting tasks with varying vocabulary sizes up to 20000 words. >

37 citations


PatentDOI
TL;DR: In this paper, a Dynamic Time/Frequency Warping (DTFW) technique was proposed for speaker verification, speech recognition and channel normalization, among other uses. The DTFW technique utilities best path dynamic programming methods using a 3-dimensional time frequency array representing the spectral differences between a test utterance and a reference utterance (template).
Abstract: A Dynamic Time/Frequency Warping (DTFW) technique is disclosed for speaker verification, speech recognition and channel normalization, among other uses. The DTFW technique utilities best path dynamic programming methods using a 3-dimensional time frequency array representing the spectral differences between a test utterance (the utterance being analyzed) and a reference utterance (template). The array is created by summing the squares of the differences of each feature in each frame of the template with each feature in each frame of the utterance in question. Dynamic programming techniques are then used to find the minimal distance path matching the test utterance and the template so as to optimize the time and frequency warping paths.

32 citations


BookDOI
01 Jan 1994
TL;DR: A connectionist approach to speech recognition, Y. Bengio signature verification with a Siamese TDNN and an integrated architecture for recognition of totally unconstrained hand-written numerals.
Abstract: A connectionist approach to speech recognition, Y. Bengio signature verification with a Siamese TDNN, J. Bromley et al boosting performance in neural networks, H. Drucker et al an integrated architecture for recognition of totally unconstrained hand-written numerals, A. Gupta et al time warping network - a neural approach to hidden Markov model-based speech recognition, E. Levin et al computing optical flow with a recurrent neural network, H. Li and J. Wang integrated segmentation and recognition through exhaustive scans or learned Saccadic jumps, G. Martin et al experimental comparison of the effect of order in recurrent neural networks, C.B. Miller and C.L. Giles adaptive classification by neural net based prototype populations, K. Peleg and U. Ben Hanan a neural system for the recognition of partially occluded objects in cluttered scenes - a pilot study, L. Wiskott and C. von der Malsburg. (Part contents).

24 citations


Proceedings ArticleDOI
09 Oct 1994
TL;DR: This work proposes a multidimensional dynamic-programming technique which can efficiently solve time-warping optimization problems involving colored noise, and allows control over the warping function curvature.
Abstract: Dynamic time warping (DTW) is a dynamic programming technique widely used for solving time-alignment problems. The classical DTW constrains only the first derivative of the warping function, hence allowing no direct control over the warping function curvature. Moreover, it implicitly assumes-inappropriately for some applications-that the noise is white. We propose a multidimensional dynamic-programming technique which can efficiently solve time-warping optimization problems involving colored noise, and allows control over the warping function curvature. The technique is demonstrated for the co-channel speech separation problem. Applications employing DTW can benefit from the new technique, which offers improved accuracy and robustness in the presence of colored noise and competing speech.

12 citations


Journal ArticleDOI
TL;DR: An algorithm for estimating state-dependent polynomial coefficients in the nonstationary-state hidden Markov model (or the trended HMM) which allows for the flexibility of linear time warping or scaling in individual model states is presented.

Journal Article
TL;DR: In this article, an automatic time-warping method is proposed which is based on a criterion of maximal local similarity between original and timewarped waveforms, and tests on time domain and subband domain representations of audio are discussed and show the practicality of the approach.
Abstract: The paper addresses the problem of modifying the playback speed of audio recordings while maintaining high signal quality and naturalness (i.e., time scaling while preserving frequency domain characteristics). An automatic time-warping method is proposed which is based on a criterion of maximal local similarity between original and time-warped waveforms. Tests on time domain and subband domain representations of audio are discussed and show the practicality of the approach taken.

Proceedings ArticleDOI
06 Sep 1994
TL;DR: The authors have developed a speaker-independent, isolated-word recognition system using a neural network to recognize the underlying sequence of phonemes and a dynamic time warping technique to time-align the recognized sequence ofphonemes with corresponding lexical sequences of phonEMes.
Abstract: The authors have developed a speaker-independent, isolated-word recognition system using a neural network to recognize the underlying sequence of phonemes and a dynamic time warping (DTW) technique to time-align the recognized sequence of phonemes with corresponding lexical sequences of phonemes. A significant feature of this system is the ability to easily change the vocabulary, since the lexical entries are simply derived from their phoneme sequences. >

Proceedings ArticleDOI
25 Oct 1994
TL;DR: A novel representation for speech signals is proposed, in which the time-varying frequency content of a speech segment is represented as a weighted sum of two-dimensional basis vectors which incorporate both frequency warping and frequency-dependent time warping.
Abstract: A novel representation for speech signals is proposed. The time-varying frequency content of a speech segment is represented as a weighted sum of two-dimensional basis vectors; these incorporate both frequency warping and frequency-dependent time warping. This is quite flexible; for example, any arbitrary time or frequency warping function can easily be implemented, and any time-frequency representation can be used as the starting point. Examples are presented which demonstrate desirable characteristics of the representation: (1) explicit quantification of parameter trajectories, (2) time resolution which varies with respect to time and frequency, and (3) the ability to reconstruct a time-frequency plot which reflects the resolution characteristics of the representation. >


Proceedings Article
01 Sep 1994
TL;DR: A prosodic method for segmenting continuous speech into accent phrases by using dynamic time warping between F0 contours of input speech and reference accent patterns called pitch pattern templates is described.
Abstract: This paper describes a prosodic method for segmenting continuous speech into accent phrases. Optimum sequences are obtained on the basis of least squared error criterion by using dynamic time warping between F0 contours of input speech and reference accent patterns called ‘pitch pattern templates’. But the optimum sequence does not always give good agreement with phrase boundaries labeled by hand, while the second or the third optimum candidate sequence does well. Therefore, we expand our system to be able to find out multiple candidates by using N-best algorithm. Evaluation tests were carried out using the ATR continuous speech database of 10 speakers. The results showed about 97% of phrase boundaries were correctly detected when we took 30-best candidates, and this accuracy is 7.5% higher than the conventional method without using N-best search algorithm.

Proceedings ArticleDOI
25 Oct 1994
TL;DR: A new system is presented for text-dependent speaker verification that uses data fusion concepts to combine the results of distortion-based and discriminant-based classifiers and is found to perform exceptionally well.
Abstract: A new system is presented for text-dependent speaker verification. The system uses data fusion concepts to combine the results of distortion-based and discriminant-based classifiers. Hence, both intraspeaker and interspeaker information are utilized in the final decision. The distortion and discriminant-based classifiers used are dynamic time warping and the neural tree network, respectively. The system is evaluated with several hundred one word utterances collected over a telephone channel. All handsets considered in this experiment use electret microphones. The new system is found to perform exceptionally well for this task. A second experiment uses handsets having both electret and carbon button microphones. Here, a channel detection scheme is proposed that improves performance under these conditions.© (1994) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

01 Jan 1994
TL;DR: This paper deals with the improvement of stochastic techniques, especially for a better representation of time varying phenomena.
Abstract: Stochastic modeling is a flexible method for handling the large variability in speech for recognition applications. In contrast to dynamic time warping where heuristic training methods for estimating word templates are used, stochastic modeling allows a probabilistic and automatic training for estimating models. This paper deals with the improvement of stochastic techniques, especially for a better representation of time varying phenomena.


Proceedings ArticleDOI
13 Apr 1994
TL;DR: This speech recognition approach has the features of great adaptivity and fault tolerance to carry out recognition and can perform not only the recognition task but also restore the correct information from incomplete even some extent incorrect information at the same time.
Abstract: Presents an extended loop neural network approach to speech recognition. This speech recognition approach is characterized by the following important properties due to the associative memory neural network. (1) It has the features of great adaptivity and fault tolerance to carry out recognition. (2) The recognition system can be constructed which allows for the formation of arbitrary nonlinear decision surfaces. (3) The recognition system can perform not only the recognition task but also restore the correct information from incomplete even some extent incorrect information at the same time. Experiments are also conducted and the results show that this speech recognition approach has great application potentials. >

17 Jan 1994
TL;DR: This contribution presents one minor correction to the recommended value for the fraction above threshold in contribution T1A1.5/93-152, a method for estimating the video delay uncertainty of the automated time alignment algorithm, and an improved motion spike detector that could be used for computing parameters p10 and p11 in T1
Abstract: Contribution T1A15/93-152 summarized the methods of measurement for objective video quality parameters based on the Sobel-filtered image and the motion difference image that were submitted prior to conducting the T1A1 subjective experiment (this experiment collected 625 mean opinion scores - ie, 25 test scenes passed through 25 different video transmission systems that ranged in bit rate from 64 kb/sec to 45 Mb/sec) This contribution presents (1) one minor correction to the recommended value for the fraction above threshold in contribution T1A15/93-152, and (2) a method for estimating the video delay uncertainty of the automated time alignment algorithm presented in section 3 of contribution T1A15/93-152 (non-zero video delay uncertainty may result when dynamic time warping, or variable video delay, is present in the video transmission system, or when there is a substantial number of dropped video frames), (3) a method for using this video delay uncertainty in the computation of the parameters presented in T1A15/93-152, and (4) an improved motion spike detector that could be used for computing parameters p10 and p11 in T1A15/93-152

Book ChapterDOI
01 Jan 1994
TL;DR: It has also become clear that the use of higher level knowledge during the recognition process (or more generally, the efficient interaction between multiple knowledge sources) is required to overcome the limitations of current ASR systems.
Abstract: Given all the difficulties presented in Chapter 1, Automatic Speech Recognition (ASR) remains a challenging problem in pattern recognition. After half a century of research, the performance currently achieved by state of the art systems is not yet at the level of a mature technology. Over the years, many technological innovations have boosted the level of performance for more and more difficult tasks. Some of the most significant of these innovations include: (1) pattern matching approaches (e.g., DTW), (2) statistical pattern recognition (e.g., HMMs), (3) better use of a priori phonological knowledge, and (4) integration of syntactic constraints in Continuous Speech Recognition (CSR) algorithms. However, despite impressive improvements, performance on realistic (i.e., fairly unconstrained) tasks are still far too low for effective use. It seems likely that new technological breakthroughs will be required for the major performance improvement that will be required. Even if one assumes infinite computational power, an infinite storage and corresponding memory bandwidth, and an infinite amount of training data, it is still not certain that one could solve the ASR problem in a satisfactory way. It has also become clear that the use of higher level knowledge during the recognition process (or more generally, the efficient interaction between multiple knowledge sources) is required to overcome the limitations of current ASR systems.