Showing papers on "Dynamic time warping published in 1999"

PDF

Open Access

Scaling up Dynamic Time Warping for Datamining Application

[...]

01 Jan 1999

TL;DR: In this paper, a modification of DTW, called Piecewise Aggregate Approximation (PAA), is proposed to improve the robustness of time series distance calculation with no loss of accuracy.

...read moreread less

Abstract: There has been much recent interest in adapting data mining algorithms to time series databases. Most of these algorithms need to compare time series. Typically some variation of Euclidean distance is used. However, as we demonstrate in this paper, Euclidean distance can be an extremely brittle distance measure. Dynamic time warping (DTW) has been suggested as a technique to allow more robust distance calculations, however it is computationally expensive. In this paper we introduce a modification of DTW which operates on a higher level abstraction of the data, in particular, a Piecewise Aggregate Approximation (PAA). Our approach allows us to outperform DTW by one to two orders of magnitude, with no loss of accuracy.

...read moreread less

670 citations

Book Chapter•DOI•

Scaling up Dynamic Time Warping to Massive Dataset

[...]

Eamonn Keogh¹, Michael J. Pazzani¹•Institutions (1)

University of California¹

15 Sep 1999

TL;DR: This paper introduces a modification of DTW which operates on a higher level abstraction of the data, in particular, a piecewise linear representation and demonstrates that this approach allows us to outperform DTW by one to three orders of magnitude.

...read moreread less

Abstract: There has been much recent interest in adapting data mining algorithms to time series databases. Many of these algorithms need to compare time series. Typically some variation or extension of Euclidean distance is used. However, as we demonstrate in this paper, Euclidean distance can be an extremely brittle distance measure. Dynamic time warping (DTW) has been suggested as a technique to allow more robust distance calculations, however it is computationally expensive. In this paper we introduce a modification of DTW which operates on a higher level abstraction of the data, in particular, a piecewise linear representation. We demonstrate that our approach allows us to outperform DTW by one to three orders of magnitude. We experimentally evaluate our approach on medical, astronomical and sign language data.

...read moreread less

248 citations

Proceedings Article•DOI•

Continuous dynamic time warping for translation-invariant curve alignment with applications to signature verification

[...]

Mario E. Munich¹, Pietro Perona¹•Institutions (1)

California Institute of Technology¹

01 Sep 1999

TL;DR: A new method for comparing planar curves and for performing matching at sub-sampling resolution is presented and the performance of the well-known Dynamic Time Warping algorithm is compared.

...read moreread less

Abstract: The problem of establishing correspondence and measuring the similarity of a pair of planar curves arises in many applications in computer vision and pattern recognition. This paper presents a new method for comparing planar curves and for performing matching at sub-sampling resolution. The analysis of the algorithm as well as its structural properties are described. The performance of the new technique applied to the problem of signature verification is shown and compared with the performance of the well-known Dynamic Time Warping algorithm.

...read moreread less

179 citations

Journal Article•DOI•

Self-Organizing Maps and Learning Vector Quantization forFeature Sequences

[...]

Panu Somervuo¹, Teuvo Kohonen¹•Institutions (1)

Helsinki University of Technology¹

01 Oct 1999-Neural Processing Letters

TL;DR: The Self-Organizing Map (SOM) and Learning Vector Quantization (LVQ) algorithms are constructed in this work for variable-length and warped feature sequences and good results have been obtained in speaker-independent speech recognition.

...read moreread less

Abstract: The Self-Organizing Map (SOM) and Learning Vector Quantization (LVQ) algorithms are constructed in this work for variable-length and warped feature sequences. The novelty is to associate an entire feature vector sequence, instead of a single feature vector, as a model with each SOM node. Dynamic time warping is used to obtain time-normalized distances between sequences with different lengths. Starting with random initialization, ordered feature sequence maps then ensue, and Learning Vector Quantization can be used to fine tune the prototype sequences for optimal class separation. The resulting SOM models, the prototype sequences, can then be used for the recognition as well as synthesis of patterns. Good results have been obtained in speaker-independent speech recognition.

...read moreread less

170 citations

Journal Article•DOI•

On the relative importance of various components of the modulation spectrum for automatic speech recognition

[...]

Noboru Kanedera¹, Takayuki Arai², Hynek Hermansky³, Hynek Hermansky⁴, Misha Pavel⁵, Misha Pavel⁴ - Show less +2 more•Institutions (5)

Ishikawa National College of Technology¹, Sophia University², International Computer Science Institute³, Oregon Health & Science University⁴, AT&T Labs⁵

01 May 1999-Speech Communication

TL;DR: Most of the useful linguistic information is in modulation frequency components from the range between 1 and 16 Hz, with the dominant component at around 4 Hz, and in some realistic environments, the use of componentsfrom the range below 2 Hz or above 16 Hz can degrade the recognition accuracy.

...read moreread less

135 citations

Proceedings Article•DOI•

Fast retrieval of similar subsequences in long sequence databases

[...]

Sanghyun Park¹, Dongwon Lee², Wesley W. Chu¹•Institutions (2)

University of California, Los Angeles¹, Penn State College of Information Sciences and Technology²

07 Nov 1999

TL;DR: A novel aligned subsequence matching scheme is proposed, where the number of subsequences to be compared with a query sequence is reduced to linear to L~, and an indexing technique is presented to speed-up the aligned subsequences matching using the similarity measure of the modified time warping distance.

...read moreread less

Abstract: Although the Euclidean distance has been the most popular similarity measure in sequence databases, recent techniques prefer to use high-cost distance functions such as the time warping distance and the editing distance for wider applicability However, if these distance functions are applied to the retrieval of similar subsequences, the number of subsequences to be inspected during the search is quadratic to the average length L~ of data sequences We propose a novel subsequence matching scheme, called the aligned subsequence matching, where the number of subsequences to be compared with a query sequence is reduced to linear to L~ We also present an indexing technique to speed-up the aligned subsequence matching using the similarity measure of the modified time warping distance Experiments on synthetic data sequences demonstrate the effectiveness of our proposed approach; ours consistently outperformed sequential scanning and achieved an up to 65 times speed-up

...read moreread less

91 citations

Proceedings Article•DOI•

On-line adaptation in recognition of handwritten alphanumeric characters

[...]

V. Vuori¹, Jorma Laaksonen, Erkki Oja, Jari Kangas•Institutions (1)

Helsinki University of Technology¹

20 Sep 1999

TL;DR: An adaptive online recognizer that is suitable for recognizing isolated alphanumeric characters based on the k nearest neighbor rule is developed that is carried out during normal use in a self-supervised fashion and thus remains otherwise unnoticed by the user.

...read moreread less

Abstract: We have developed an adaptive online recognizer that is suitable for recognizing isolated alphanumeric characters. It is based on the k nearest neighbor rule. Various dissimilarity measures, all based on dynamic time warping (DTW), have been studied. The main focus of this work is on online adaptation. The adaptation is performed by modifying the prototype set of the classifier according to its recognition performance and the user's writing style. These adaptations include: (1) adding new prototypes, (2) inactivating confusing prototypes, and (3) reshaping existing prototypes. The reshaping algorithm is based on learning vector quantization (LVQ). The writers are allowed to use their own natural style of writing, and the adaptation is carried out during normal use in a self-supervised fashion and thus remains otherwise unnoticed by the user.

...read moreread less

37 citations

Proceedings Article•DOI•

Handwritten character recognition using monotonic and continuous two-dimensional warping

[...]

Seiichi Uchida¹, Hiroaki Sakoe¹•Institutions (1)

Kyushu University¹

20 Sep 1999

TL;DR: Experimental comparisons with rigid matching and local perturbation show the performance superiority of the monotonic and continuous warping in character recognition.

...read moreread less

Abstract: In this paper, a handwritten character recognition experiment using a monotonic and continuous two-dimensional warping algorithm is reported. This warping algorithm is based on dynamic programming and searches for the optimal pixel-to-pixel mapping between given two images subject to two-dimensional monotonicity and continuity constraints. Experimental comparisons with rigid matching and local perturbation show the performance superiority of the monotonic and continuous warping in character recognition.

...read moreread less

29 citations

Proceedings Article•DOI•

A time warper for speech signals

[...]

R.J. Sluijter¹, Augustus J. E. M. Janssen•Institutions (1)

Philips¹

20 Jun 1999

TL;DR: It is shown how, for a harmonic signal segment, the parabolic time warping function can remove the part of the frequency variation which progresses linearly with time, without changing the time duration of that segment.

...read moreread less

Abstract: A parabolic time warper designed to enhance the stationarity of voiced speech segments, is presented. It is shown how, for a harmonic signal segment, the parabolic time warping function can remove the part of the frequency variation which progresses linearly with time, without changing the time duration of that segment. In the actual implementation of the time warping system, the linear part of the pitch frequency variation in a segment is removed on the basis of maximization of the pitch-related autocorrelation peak of the warped signal. As a by-product, the time warper yields a very reliable pitch estimation. An example on real speech is discussed.

...read moreread less

26 citations

Proceedings Article•DOI•

Time warping of audio signals

[...]

Goldenstein¹, Gomes²•Institutions (2)

University of Pennsylvania¹, Instituto Nacional de Matemática Pura e Aplicada²

07 Jun 1999

TL;DR: This technique tries to eliminate distortions by the replication of the original signal frequencies by usingMalvar wavelets are used to avoid clicking between segment transitions.

...read moreread less

Abstract: Describes a technique to obtain a time dilation or contraction of an audio signal. Different computer graphics applications can take advantage of this technique. In real-time networked virtual reality applications, such as teleconferences or games, the audio might be transmitted independently from the rest of the data. These different signals arrive asynchronously and need to be somehow resynchronized on-the-fly. In animation, it can help to automatically fit and merge pre-recorded sound samples to special timed events. It also makes it easier to accomplish special effects, like lip-sync for dubbing or changing the voice of an animated character. Our technique tries to eliminate distortions by the replication of the original signal frequencies. Malvar wavelets are used to avoid clicking between segment transitions.

...read moreread less

22 citations

Patent•

Speech recognition method

[...]

Yun Keun Lee¹, Jong Seok Lee¹, Gi Bak Kim¹, Byoung Soo Lee¹•Institutions (1)

LG Electronics¹

21 Apr 1999

TL;DR: In this article, the authors proposed an automated dialing method for mobile telephones, where a user enters a telephone number via the keypad of the mobile phone, followed by speaking a corresponding codeword into the handset.

...read moreread less

Abstract: The present invention relates to an automated dialing method for mobile telephones According to the method, a user enters a telephone number via the keypad of the mobile phone, followed by speaking a corresponding codeword into the handset The voice signal is encoded using the CODEC and vocoder already on board the mobile phone The speech is divided into frames and each frame analyzed to ascertain its primary spectral features These features are stored in memory as associated with the numeric keypad sequence In recognition mode, the user speaks the codeword into the handset, which is analyzed in a like fashion as in training mode The primary spectral features are compared with those stored in memory When a match is declared according to preset criteria, the telephone number is automatically dialed by the mobile phone Time warping techniques may be applied in the analysis to reduce timing variations

...read moreread less

DOI•

Reconnaissance et transformation de locuteurs

[...]

Dominique Genoud

01 Jan 1999

TL;DR: This PhD thesis tries to understand how to analyse, decompose, model and transform the vocal identity of a human when seen through an automatic speaker recognition application, with a study of the impostors phenomenon.

...read moreread less

Abstract: This PhD thesis tries to understand how to analyse, decompose, model and transform the vocal identity of a human when seen through an automatic speaker recognition application. It starts with an introduction explaining the properties of the speech signal and the basis of the automatic speaker recognition. Then, the errors of an operating speaker recognition application are analysed. From the deficiencies and mistakes noticed in the running application, some observations cm be made which will imply a re-evaluation of the characteristic parameters of a speaker, and to reconsider some parts of the automatic speaker recognition chain. In order to determine what are the characterising parameters of a speaker, these are extracted from the speech signal with an analysis and synthesis harmonic plus noise model (H+N). The analysis and re-synthesis of the harmonic and noise parts indicate those which are speech or speaker dependent. It is then shown that the speaker discriminating information can be found in the residual of the subtraction from the original signal of the H+N modeled signal. Then, a study of the impostors phenomenon, essential in the tuning of a speaker recognition system, is carried out. The impostors are simulated in two ways: first by a transformation of the speech of a source speaker (the impostor) to the speech of a target speaker (the client) using the parameters extracted from the H+N model. This way of transforming the parameters is efficient as the false acceptance rate grows from 4% to 23%. Second, an automatic imposture by speech sepent concatenation is carried out. In this case the false acceptance rate grows to 30%. A way to become less sensitive to the spectral modification impostures is to remove the harmonic part or even the noise part modeled by the H+N from the original signal. Using such a subtraction decreases the false acceptance rate to 8% even if transformed impostors are used. To overcome the lack of training data — one of the main cause of modeling errors in speaker recognition — a decomposition of the recognition task into a set of binary classifiers is proposed. A classifier matrix is built and each of its elements has to classify word by word the data coming from the client and another speaker (named here an anti-speaker, randomly chosen from an extemal database). With such an approach it is possible to weight the results according to the vocabulary or the neighbours of the client in the parameter (acoustic) space. The output of the mamx classifiers are then weighted and mixed in order to produce a single output score. The weights are estimated on validation data, and if the weighting is done properly, the binary pair speaker recognition system gives better results than a state of the an HMM based system. In order to set a point of operation (i.e. a point on the COR cuwe) for the speaker recognition application, an a priori threshold has to be determined. Theoretically the threshold should be speaker independent when stochastic models are used. However, practical experiments show that this is not the case, as due to modeling mismatch the threshold becomes speaker and utterance length dependant. A theoretical framework showing how to adjust the threshold using the local likelihood ratio is then developed. Finally, a last modeling error correction method using decision fusion is proposed. Some practical experiments show the advantages and drawbacks of the fusion approach in speaker recognition applications.

...read moreread less

Proceedings Article•DOI•

Visual speech analysis and synthesis with application to Mandarin speech training

[...]

Xiaodong Jiang¹, Yunlai Wang, Feiye Zhang¹•Institutions (1)

Nanjing University¹

20 Dec 1999

TL;DR: A novel vision-based speech analysis system STODE which is used in spoken Chinese training of oral deaf children and integrates such capabilities as real-time lip tracking and feature extraction, multi-state lip modeling, Time-delay Neural Network (TDNN) for visual speech analysis.

...read moreread less

Abstract: This paper presents a novel vision-based speech analysis system STODE which is used in spoken Chinese training of oral deaf children. Its design goal is to help oral deaf children overcome two major difficulties in speech learning: the confusion of intonations for spoken Chinese characters and timing errors within different words and characters. It integrates such capabilities as real-time lip tracking and feature extraction, multi-state lip modeling, Time-delay Neural Network (TDNN) for visual speech analysis. A desk-mounted camera tracks users in real-time. At each frame, region of interest is identified and key information is extracted. The preprocessed acoustic and visual information are then fed into a modular TDNN and combined for visual speech analysis. Confusion of intonations for spoken Chinese characters can be easily identified, and timing error within words and characters also can be detected using a DTW (Dynamic Time Warping) algorithm. For visual feedback we have created an artificial talking head directly cloned from user's own images to generate correct outputs showing both correct and wrong ways of pronunciation. This system has been successfully used for spoken Chinese training of oral deaf children in cooperation with Nanjing Oral School under grants from National Natural Science Foundation of China.

...read moreread less

Proceedings Article•DOI•

Text-dependent speaker identification using LPC and DTW for Thai language

[...]

Chai Wutiwiwatchai, Varin Achariyakulporn, Chularat Tanprasert

15 Sep 1999

TL;DR: This paper proposes a text-dependent speaker identification system applied to Thai language using isolated digits 0-9 and their concatenations and dynamic time warping to measure distances between referenced and evaluated vectors.

...read moreread less

Abstract: This paper proposes a text-dependent speaker identification system applied to Thai language. Isolated digits 0-9 and their concatenations are used for speaking text. Linear prediction coefficients (LPC) are extracted and formed as feature vectors represented each speech signal. Dynamic time warping (DTW) is used to measure distances between referenced and evaluated vectors. These distances, indicating nearness of unknown vectors to references, incorporated with the K-nearest neighbor (KNN) decision technique are used to decide who possesses those unknown vectors. The experimental results have shown that the best identification rate for a single digit is 95.83% and the highest rate for concatenated digits of top-3, top-5, and top-7 are 98.75%, 100%, and 99.20%, respectively.

...read moreread less

Proceedings Article•

An efficient decoding method for real time speech recognition.

[...]

S. Ortmanns, Wolfgang Reichl, Wu Chou

01 Jan 1999

TL;DR: Experimental results on a natural language call routing task indicate that the proposed techniques speeded up the search process by a factor of 4 without loss in the recognition accuracy.

...read moreread less

Abstract: In this paper, we describe approaches for improving the search efficiency of a dynamic programming based onepass decoder for dialogue applications. In order to allow the use of long-term language models (LM) and crossword acoustic models, efficient pruning techniques and fast methods for the calculation of emission probability density functions (pdfs) are required. This is particularly important for real-time and memory constrained applications such as dialogue systems involving automatic speech recognition (ASR) and natural-language understanding. We propose an effective pruning technique exploiting the LM and cross-word context. We also present a fast distance calculation method to reduce the cost of state likelihood calculations in HMM-based systems. Experimental results on a natural language call routing task indicate that the proposed techniques speeded up the search process by a factor of 4 without loss in the recognition accuracy. In addition, we present a technique for generating word graphs incorporating cross-word context.

...read moreread less

Proceedings Article•DOI•

Adaptive local subspace classifier in on-line recognition of handwritten characters

[...]

Jorma Laaksonen¹, Matti Aksela¹, Erkki Oja¹, Jari Kangas²•Institutions (2)

Helsinki University of Technology¹, Nokia²

10 Jul 1999

TL;DR: A recognition system which enhances its accuracy by applying continuous adaptation to the user's writing style is developed, which uses dynamic time warping (DTW) in matching the input characters with the stored prototypes.

...read moreread less

Abstract: Subsystems for online recognition of handwriting are needed in personal digital assistants (PDA) and other portable handheld devices. We have developed a recognition system which enhances its accuracy by applying continuous adaptation to the user's writing style. The forms of adaptation we have experimented with take place simultaneously with the normal operation of the system and therefore, there is no need for separate training period of the device. The present implementation uses dynamic time warping (DTW) in matching the input characters with the stored prototypes. The DTW algorithm implemented with dynamic programming (DP) is, however both time and memory consuming. In our current research we have experimented with methods that transform the elastic templates to pixel images which can then be recognized by using statistical or neural classification. The particular neural classifier we have used is the local subspace classifier (LSC) of which we have developed an adaptive version.

...read moreread less

Proceedings Article•DOI•

On time alignment and metric algorithms for speech recognition

[...]

E.A. Yfantis, T. Lazarakis, A. Angelopoulos, J.D. Elison, Y. Zhang - Show less +1 more

31 Oct 1999

TL;DR: An algorithm for comparing speech waveforms to decide if the spoken utterance is part of a given vocabulary of word waveforms or not, and if it is part the vocabulary, to choose the matching word is presented and preliminary results show that the algorithm provides high probability of correct classification.

...read moreread less

Abstract: An algorithm for comparing speech waveforms to decide if the spoken utterance is part of a given vocabulary of word waveforms or not, and if it is part of the vocabulary, to choose the matching word is presented. Our algorithm has been implemented in connection with our own vector interpolation alignment algorithm which is faster than dynamic time warping and yet as accurate as dynamic time warping. This vector interpolation, has a classification rate comparable to that of dynamic time warping. While vector interpolation is able to match dynamic time warping for recognition accuracy, it requires significantly less computation, making it much faster than DTW based algorithms. Both algorithms are presented and a comparison of the two is made. Also an alternative algorithm, where the number of intervals of the two utterances to be compared is the same, where the length of the intervals in one utterance is different than the length of the intervals in the other utterance, has been investigated. When appropriate adjustments are made so that the beginning and end of the two utterances match, this algorithm has a classification rate comparable to that of dynamic time warping. Furthermore an alternative to LPC analysis for utterance recognition is presented. Unlike LPC which is an extrapolation algorithm, our algorithm is an interpolation algorithm. Theoretically our algorithm has smaller variance and smaller mean square error than the LPC algorithm. Preliminary results show that our algorithm provides high probability of correct classification.

...read moreread less

Proceedings Article•DOI•

Fuzzy speech recognition

[...]

Tong Zhao¹, Peng-Yung Woo¹•Institutions (1)

Northern Illinois University¹

10 Jul 1999

TL;DR: The proposed algorithm, which uses a fuzzy logic recognition approach based on the power distribution pattern of a segment of a speech, allows the implementation of real-time speech recognition.

...read moreread less

Abstract: Speech recognition is a major topic in speech signal processing. Many algorithms based on results of speech analysis, among which dynamic time warping) and hidden Markov models are the most important, have been advanced. However, these algorithms generally turn out to be too complicated to be implemented in real time systems. The proposed algorithm in this paper, which uses a fuzzy logic recognition approach based on the power distribution pattern of a segment of a speech, allows the implementation of real-time speech recognition.

...read moreread less

Proceedings Article•DOI•

Model combination and weight selection criteria for speaker verification

[...]

K.R. Farrell

23 Aug 1999

TL;DR: Both weight selection methods provide performance close to the optimal point, and it is shown that the optimal combination of three models provides lower error rates than that achievable with two models.

...read moreread less

Abstract: We focus on the score combination for three separate modeling approaches as applied to text-dependent speaker verification. The modeling methods that are evaluated consist of the neural tree network (NTN), hidden Markov model (HMM), and dynamic time warping (DTW). One of the main challenges in combining scores of several models is how to select the weight for each model. One method is to use equal weights for all models used in the combination. Another method is to use the Fisher linear discriminant to select the weights that maximize the ratio of the separation of the inter-class means to the sum of the variances. Both methods are evaluated for three separate databases and the results are compared to the optimal performance as obtained by an exhaustive search over the weight space. Overall, both weight selection methods provide performance close to the optimal point. It is also shown that the optimal combination of three models provides lower error rates than that achievable with two models.

...read moreread less

Patent•

Noise padding and normalization in dynamic time warping

[...]

Adoram Erell¹•Institutions (1)

Intel¹

06 Jan 1999

TL;DR: In this article, a speech recognition system includes a token builder, a noise estimator, a template padder, a gain and noise adapter and a dynamic time warping (DTW) unit.

...read moreread less

Abstract: A speech recognition system includes a token builder, a noise estimator, a template padder, a gain and noise adapter and a dynamic time warping (DTW) unit. The token builder produces a widened test token representing an input test utterance and at least one frame before and after the input test utterance. The noise estimator estimates noise qualities of the widened test token. The template padder pads each of a plurality of reference templates with at least one blank frame either the beginning or end of the reference template. The gain and noise adapter adapts each padded reference template with the noise and gain qualities thereby producing adapted reference templates having noise frames wherever a blank frame was originally placed and noise adapted speech where speech exists. The DTW unit performs a noise adapted DTW operation comparing the widened token with one of the noise adapted reference templates.

...read moreread less

Proceedings Article•DOI•

User-identification on Window environment by using the speech

[...]

Jong-Soon Jung¹, Jae-Ok Bae, Kyung-A Jang, Myung-Jin Bae•Institutions (1)

Soongsil University¹

31 Oct 1999

TL;DR: An individual verification system for a multimedia environment such as Windows 95 by using DTW (dynamic time warping) and finds that the weighted cepstrum reveals an effect on intensifying the difference between the customer and the imposter.

...read moreread less

Abstract: We implement an individual verification system for a multimedia environment such as Windows 95 by using DTW (dynamic time warping). The conventional method for speaker recognition uses a password through the keyboard. However, this paper uses speech. The major feature of this study is summarized as follows. (1) We make a complete reference pattern by updating the new speech pattern with the F/sub 1//F/sub 0/ ratio. This method has a high recognition rate compared with the other systems whose performance degrade rapidly as time goes on. (2) We use the F-ratio values as weighted values of the cepstral coefficients. We find that the weighted cepstrum reveals an effect on intensifying the difference between the customer and the imposter. Also the speaker recognition rate is improved by more than 5% than the conventional DTW pattern matching with cepstrum. This shows the possibility that the speech signal can be used as a means of individual verification on a Windows environment.

...read moreread less

Journal Article•DOI•

Application of automatic speech recognition to evaluation of speech transmission quality in analog communication systems

[...]

Wojciech Majewski, Wojciech Myslecki

25 Jan 1999-Journal of the Acoustical Society of America

TL;DR: In this paper, a memoryless, finite state recognition system with LPC, FFT, and BF-FFT parametrization was applied to evaluate speech transmission quality in analog telephone channels.

...read moreread less

Abstract: The preliminary results of application of automatic recognition of isolated words to objective evaluation of speech transmission quality in analog telephone channels are presented. A memoryless, finite state recognition system with LPC, FFT, and BF‐FFT (where the speech signal was filtered in Bark bands) parametrization was applied. In classification stage a dynamic time warping and nearest‐neighbor algorithm were utilized. Nonsense word lists consisting of 100 logotoms were recorded in a studio by a professional male speaker and utilized next as a test material. Speech transmission quality was examined in laboratory models of telephone channels with frequency bands of 300–3400, 400–2500, and 100–6000 Hz for speech‐to‐white‐noise ratios in the range of +15 to −15 dB. The results of objective measurements expressed in percent of logotoms correctly recognized by the recognition system were compared under the same transmission conditions with subjectively measured logotom intelligibility. The best agreement between subjective and objective evaluation of speech transmission quality was obtained for automatic speech recognition utilizing BF–FFT parametrization. The results of objective evaluation of speech transmission quality by means of the presented method are encouraging and the experiments will be continued for other communication channels (e.g., digital) and different distortions and disturbances.

...read moreread less

Book Chapter•DOI•

Speech Recognition and Speaker Identification

[...]

Manfred R. Schroeder¹•Institutions (1)

University of Göttingen¹

01 Jan 1999

TL;DR: This chapter discusses, in an informal manner, some of the successes and a few of the outstanding problems of automatic speech recognition (ASR) and speaker identification — for forensic, business and banking purposes.

...read moreread less

Abstract: In this chapter we discuss, in an informal manner, some of the successes and a few of the outstanding problems of automatic speech recognition (ASR) and speaker identification — for forensic, business and banking purposes. ASR can also help the hard-of-hearing by giving them printed text to read, and the wheelchair-bound by allowing them to control their vehicles by voice. Together with speech synthesis from text, human-machine dialogue systems offer attractive possibilities for all manner of information services.

...read moreread less