scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic time warping published in 2020"


Journal ArticleDOI
TL;DR: The proposed MTL-TCNN with the ST-DTW algorithm is a promising method for short-term passenger demand prediction in a multi-zone level.
Abstract: Accurate short-term passenger demand prediction contributes to the coordination of traffic supply and demand. This paper proposes an end-to-end multi-task learning temporal convolutional neural network (MTL-TCNN) to predict the short-term passenger demand in a multi-zone level. Along with a feature selector named spatiotemporal dynamic time warping (ST-DTW) algorithm, this proposed MTL-TCNN is quite qualified for the multi-task prediction problem with the consideration of spatiotemporal correlations. Then, based on the car-calling demand data from Didi Chuxing, Chengdu, China, and taxi demand data from the New York City, the numerical results show that the MTL-TCNN outperforms both classic methods (i.e., historical average (HA), v -support vector machine ( v -SVM), and XGBoost) and the state-of-the-art deep learning approaches [e.g., long short-term memory (LSTM) and convolutional LSTM (ConvLSTM)] in both the single task learning (STL) and multi-task learning (MTL) scenarios. In summary, the proposed MTL-TCNN with the ST-DTW algorithm is a promising method for short-term passenger demand prediction in a multi-zone level.

114 citations


Posted Content
Jeff Donahue1, Sander Dieleman1, Mikołaj Bińkowski1, Erich Elsen1, Karen Simonyan1 
TL;DR: This work takes on the challenging task of learning to synthesise speech from normalised text or phonemes in an end-to-end manner, resulting in models which operate directly on character or phoneme input sequences and produce raw speech audio outputs.
Abstract: Modern text-to-speech synthesis pipelines typically involve multiple processing stages, each of which is designed or learnt independently from the rest. In this work, we take on the challenging task of learning to synthesise speech from normalised text or phonemes in an end-to-end manner, resulting in models which operate directly on character or phoneme input sequences and produce raw speech audio outputs. Our proposed generator is feed-forward and thus efficient for both training and inference, using a differentiable alignment scheme based on token length prediction. It learns to produce high fidelity audio through a combination of adversarial feedback and prediction losses constraining the generated audio to roughly match the ground truth in terms of its total duration and mel-spectrogram. To allow the model to capture temporal variation in the generated audio, we employ soft dynamic time warping in the spectrogram-based prediction loss. The resulting model achieves a mean opinion score exceeding 4 on a 5 point scale, which is comparable to the state-of-the-art models relying on multi-stage training and additional supervision.

111 citations


Journal ArticleDOI
TL;DR: An adaptive constrained DTW (ACDTW) algorithm is developed to calculate the distances between trajectories more accurately by introducing new adaptive penalty functions and has the best performance among three existing algorithms in modeling maritime traffic vessel trajectory.

93 citations


Journal ArticleDOI
TL;DR: The proposed ensemble empirical mode decomposition (EEMD) based Long Short-Term Memory (LSTM) learning paradigm is proposed for oil production forecasting and demonstrated that it is capable of giving almost perfect production forecasting.

87 citations


Journal ArticleDOI
TL;DR: The proposed speech recognition system is flexible with scalability and availability in adapting to existing smart IoT devices, and it provides privacy in managing patient devices.
Abstract: This paper presents an effective solution based on speech recognition to provide elderly people, patients and disabled people with an easy control system. The goal is to build a low-cost system based on speech recognition to easily access Internet of Things (IoT) devices installed in smart homes and hospitals without relying on a centralized supervisory system. The proposed system used a Raspberry Pi board to control home appliances through wireless with smartphones. The main purpose of this system is to facilitate interactions between the user and home appliances through IoT communications based on speech commands. The proposed framework contribution uses a hybrid Support Vector Machine (SVM) with a Dynamic Time Warping (DTW) algorithm to enhance the speech recognition process. The proposed solution is a machine learning-based system for controlling smart devices through speech commands with an accuracy of 97%. The results helped patients and elderly people to access and control IoT devices that are compatible with our system using speech recognition. The proposed speech recognition system is flexible with scalability and availability in adapting to existing smart IoT devices, and it provides privacy in managing patient devices. The research provides an effective method to integrate our systems among medical institutions to help elderly people and patients.

65 citations


Journal ArticleDOI
TL;DR: The results demonstrate that the proposed fall detection method outperforms the other methods in terms of higher accuracy, precision, sensitivity, and specificity values.
Abstract: Automatic fall detection using radar aids in better assisted living and smarter health care. In this brief, a novel time series-based method for detecting fall incidents in human daily activities is proposed. A time series in the slow-time is obtained by summing all the range bins corresponding to fast-time of the ultra wideband radar return signals. This time series is used as input to the proposed deep convolutional neural network for automatic feature extraction. In contrast to other existing methods, the proposed fall detection method relies on multi-level feature learning directly from the radar time series signals. In particular, the proposed method utilizes a deep convolutional neural network for automating feature extraction as well as global maximum pooling technique for enhancing model discriminability. The performance of the proposed method is compared with that of the state-of-the-art, such as recurrent neural network, multi-layer perceptron, and dynamic time warping techniques. The results demonstrate that the proposed fall detection method outperforms the other methods in terms of higher accuracy, precision, sensitivity, and specificity values.

58 citations


Journal ArticleDOI
TL;DR: A novel hybrid approach is presented in which a physics-based non-local modeling framework with data-driven clustering techniques to provide a fast and accurate multiscale modeling of compartmentalized reservoirs.

55 citations


Journal ArticleDOI
TL;DR: A novel method for continuous hand gesture detection and recognition is proposed based on a frequency modulated continuous wave (FMCW) radar and the Fusion Dynamic Time Warping (FDTW) algorithm is presented to recognize the hand gestures.
Abstract: In this article, a novel method for continuous hand gesture detection and recognition is proposed based on a frequency modulated continuous wave (FMCW) radar. Firstly, we adopt the 2-Dimensional Fast Fourier Transform (2D-FFT) to estimate the range and Doppler parameters of the hand gesture raw data, and construct the range-time map (RTM) and Doppler-time map (DTM). Meanwhile, we apply the Multiple Signal Classification (MUSIC) algorithm to calculate the angle and construct the angle-time map (ATM). Secondly, a hand gesture detection method is proposed to segment the continuous hand gestures using a decision threshold. Thirdly, the central time-frequency trajectory of each hand gesture spectrogram is clustered using the k-means algorithm, and then the Fusion Dynamic Time Warping (FDTW) algorithm is presented to recognize the hand gestures. Finally, experiments show that the accuracy of the proposed hand gesture detection method can reach 96.17%. The hand gesture average recognition accuracy of the proposed FDTW algorithm is 95.83%, while its time complexity is reduced by more than 50%.

51 citations


Journal ArticleDOI
TL;DR: Dynamic Time Warping algorithm is applied to compare sub-sequences, and combined with the shape-feature extraction algorithm for reducing insignificant solutions and proved that the performance of the approach is the most robust one in case of time series data having high auto-cor correlation and cross-correlation, strong seasonality, large gap(s), and complex distribution.

51 citations


Journal ArticleDOI
Yingbiao Yao1, Pan Lei1, Wei Fen1, Xiaorong Xu1, Xuesong Liang1, Xin Xu1 
TL;DR: A dynamic time warping–based peak prediction with zero-crossing detection to improve the SD accuracy and an improved SLE model is proposed for the different walking patterns to achieve a higher SLE accuracy.
Abstract: As an infrastructure-free positioning and navigation method, pedestrian dead reckoning (PDR) is still a research hotspot in the field of indoor localization. Step detection (SD) and stride length estimation (SLE) are two key components of PDR, and it is a challenging problem to apply SD and SLE to different walking patterns. Focusing on this problem, this paper proposes a robust SD and SLE method based on recognizing three walking patterns (i.e., Normal Walk, March in Place, and Quick Walk) using a smartphone. First, we propose a dynamic time warping–based peak prediction with zero-crossing detection to improve the SD accuracy. In particular, the proposed SD can accurately identify the starting and ending points of each step in the three walking patterns. Second, according to the extracted features of each step, a random forest algorithm with classification proofreading is used to recognize the three walking patterns. Finally, an improved SLE model is proposed for the different walking patterns to achieve a higher SLE accuracy. The experimental results show that, on average, the SD accuracy is about 97.9%, the recognition accuracy is about 98.4%, and the relative error of the estimated walking distance is about 3.0%, which outperforms those of the existing commonly used SD and SLE methods.

46 citations


Journal ArticleDOI
Weiwei Jiang1
01 Apr 2020
TL;DR: This study gives a comprehensive comparison between nearest neighbor and deep learning models and indicates that deepLearning models are not significantly better than 1-NN classifiers with edit distance with real penalty and dynamic time warping.
Abstract: Time series classification has been an important and challenging research task. In different domains, time series show different patterns, which makes it difficult to design a global optimal solution and requires a comprehensive evaluation of different classifiers across multiple datasets. With the rise of big data and cloud computing, deep learning models, especially deep neural networks, arise as a new paradigm for many problems, including image classification, object detection and natural language processing. In recent years, deep learning models are also applied for time series classification and show superiority over traditional models. However, the previous evaluation is usually limited to a small number of datasets and lack of significance analysis. In this study, we give a comprehensive comparison between nearest neighbor and deep learning models. Specifically, we compare 1-NN classifiers with eight different distance measures and three state-of-the-art deep learning models on 128 time series datasets. Our results indicate that deep learning models are not significantly better than 1-NN classifiers with edit distance with real penalty and dynamic time warping.

Journal ArticleDOI
TL;DR: A novel DMA framework based on transfer learning (TL) is proposed to deal with the adaptation of driver models in lane-changing scenarios at the data level that combines dynamic time warping (DTW) and local Procrustes analysis (LPA) and shows better performance on the model accuracy.
Abstract: Driver model adaptation (DMA) provides a way to model the target driver when sufficient data are not available. Traditional DMA methods running at the model level are restricted by the specific model structures and cannot make full use of the historical data. In this paper, a novel DMA framework based on transfer learning (TL) is proposed to deal with the adaptation of driver models in lane-changing scenarios at the data level. Under the proposed DMA framework, a new TL approach named DTW-LPA that combines dynamic time warping (DTW) and local Procrustes analysis (LPA) is developed. Using the DTW, the relationship between the datasets for different drivers can be found automatically. Based on this relationship, the LPA can transfer the data in the historical dataset to the dataset of a newly-involved driver (target driver). In this way, sufficient data can be obtained for the target driver. After the data transferring process, a proper modeling method, such as the Gaussian mixture regression (GMR), can be applied to train the model for the target driver. Data collected from a driving simulator and realistic driving scenes are used to validate the proposed method in various experiments. Compared with the GMR-only and GMR-MAP methods, the DTW-LPA shows better performance on the model accuracy with much lower predicting errors in most cases.

Journal ArticleDOI
TL;DR: By performing experiments on the entire UCR Time Series Classification Archive, it is shown that weighted kNN is able to consistently outperform 1NN and provides recommendations for the choices of the constraint width parameter r, neighborhood size k, and weighting scheme, for each mentioned elastic distance measure.
Abstract: Time-series classification has been addressed by a plethora of machine-learning techniques, including neural networks, support vector machines, Bayesian approaches, and others. It is an accepted fact, however, that the plain vanilla 1-nearest neighbor (1NN) classifier, combined with an elastic distance measure such as Dynamic Time Warping (DTW), is competitive and often superior to more complex classification methods, including the majority-voting k-nearest neighbor (kNN) classifier. With this paper we continue our investigation of the kNN classifier on time-series data and the impact of various classic distance-based vote weighting schemes by considering constrained versions of four common elastic distance measures: DTW, Longest Common Subsequence (LCS), Edit Distance with Real Penalty (ERP), and Edit Distance on Real sequence (EDR). By performing experiments on the entire UCR Time Series Classification Archive we show that weighted kNN is able to consistently outperform 1NN. Furthermore, we provide recommendations for the choices of the constraint width parameter r, neighborhood size k, and weighting scheme, for each mentioned elastic distance measure.

Posted Content
TL;DR: This study proposes a novel, data augmentation based forecasting framework that is capable of improving the baseline accuracy of the GFM models in less data-abundant settings and can outperform state-of-the-art univariate forecasting methods.
Abstract: Forecasting models that are trained across sets of many time series, known as Global Forecasting Models (GFM), have shown recently promising results in forecasting competitions and real-world applications, outperforming many state-of-the-art univariate forecasting techniques. In most cases, GFMs are implemented using deep neural networks, and in particular Recurrent Neural Networks (RNN), which require a sufficient amount of time series to estimate their numerous model parameters. However, many time series databases have only a limited number of time series. In this study, we propose a novel, data augmentation based forecasting framework that is capable of improving the baseline accuracy of the GFM models in less data-abundant settings. We use three time series augmentation techniques: GRATIS, moving block bootstrap (MBB), and dynamic time warping barycentric averaging (DBA) to synthetically generate a collection of time series. The knowledge acquired from these augmented time series is then transferred to the original dataset using two different approaches: the pooled approach and the transfer learning approach. When building GFMs, in the pooled approach, we train a model on the augmented time series alongside the original time series dataset, whereas in the transfer learning approach, we adapt a pre-trained model to the new dataset. In our evaluation on competition and real-world time series datasets, our proposed variants can significantly improve the baseline accuracy of GFM models and outperform state-of-the-art univariate forecasting methods.

Journal ArticleDOI
TL;DR: The proposed DTW-NN is a feedforward neural network that exploits the elastic matching ability of DTW to dynamically align the inputs of a layer to the weights and replaces the standard dot product within a neuron with DTW.
Abstract: This paper describes a novel model for time series recognition called a Dynamic Time Warping Neural Network (DTW-NN). DTW-NN is a feedforward neural network that exploits the elastic matching ability of DTW to dynamically align the inputs of a layer to the weights. This weight alignment replaces the standard dot product within a neuron with DTW. In this way, the DTW-NN is able to tackle difficulties with time series recognition such as temporal distortions and variable pattern length within a feedforward architecture. We demonstrate the effectiveness of DTW-NNs on four distinct datasets: online handwritten characters, accelerometer-based active daily life activities, spoken Arabic numeral Mel-Frequency Cepstrum Coefficients (MFCC), and one-dimensional centroid-radii sequences from leaf shapes. We show that the proposed method is an effective general approach to temporal pattern learning by achieving state-of-the-art results on these datasets.

Journal ArticleDOI
TL;DR: A novel, single-template strategy using a mean template set and weighted multiple dynamic time warping (DTW) distances for a function-based approach to online signature verification is proposed.

Journal ArticleDOI
TL;DR: It is shown that local distance features can be used as supplementary input information in temporal CNNs by using both the raw data and the features extracted from DTW in multi-modal fusion CNNs.

Posted Content
TL;DR: This article proposed to construct a latent multi-view graph to capture various possible relationships among tokens, and then refine this graph to select important words for relation prediction, finally, the representation of the refined graph and the BERT-based sequence representation are concatenated for relation extraction.
Abstract: Relation Extraction (RE) is to predict the relation type of two entities that are mentioned in a piece of text, e.g., a sentence or a dialogue. When the given text is long, it is challenging to identify indicative words for the relation prediction. Recent advances on RE task are from BERT-based sequence modeling and graph-based modeling of relationships among the tokens in the sequence. In this paper, we propose to construct a latent multi-view graph to capture various possible relationships among tokens. We then refine this graph to select important words for relation prediction. Finally, the representation of the refined graph and the BERT-based sequence representation are concatenated for relation extraction. Specifically, in our proposed GDPNet (Gaussian Dynamic Time Warping Pooling Net), we utilize Gaussian Graph Generator (GGG) to generate edges of the multi-view graph. The graph is then refined by Dynamic Time Warping Pooling (DTWPool). On DialogRE and TACRED, we show that GDPNet achieves the best performance on dialogue-level RE, and comparable performance with the state-of-the-arts on sentence-level RE.

Journal ArticleDOI
TL;DR: The proposed TA-RNN system outperforms the state of the art, achieving a final 2.38% Equal Error Rate, using just a 4-digit password and one training sample per character, in comparison with traditional typed-based password systems.
Abstract: Passwords are still used on a daily basis for all kind of applications. However, they are not secure enough by themselves in many cases. This work enhances password scenarios through two-factor authentication asking the users to draw each character of the password instead of typing them as usual. The main contributions of this study are as follows: i) We present the novel MobileTouchDB public database, acquired in an unsupervised mobile scenario with no restrictions in terms of position, posture, and devices. This database contains more than 64K on-line character samples performed by 217 users, with 94 different smartphone models, and up to 6 acquisition sessions. ii) We perform a complete analysis of the proposed approach considering both traditional authentication systems such as Dynamic Time Warping (DTW) and novel approaches based on Recurrent Neural Networks (RNNs). In addition, we present a novel approach named Time-Aligned Recurrent Neural Networks (TA-RNNs). This approach combines the potential of DTW and RNNs to train more robust systems against attacks. A complete analysis of the proposed approach is carried out using both MobileTouchDB and e-BioDigitDB databases. Our proposed TA-RNN system outperforms the state of the art, achieving a final 2.38% Equal Error Rate, using just a 4-digit password and one training sample per character. These results encourage the deployment of our proposed approach in comparison with traditional typed-based password systems where the attack would have 100% success rate under the same impostor scenario.

Journal ArticleDOI
TL;DR: A novel real-time static and dynamic human gesture capturing and recognition method designed using flexible wearable data band/data glove, which are designed based on three/ten stretchable strain sensors respectively.
Abstract: Human arm/hand gestures capturing and recognition provide an intelligent and convenient way for use in application from human-machine interface (HMI) and human-computer interaction (HCI) to human-robot interaction (HRI). As human gestures constitute a powerful inter-human communication modality, they can be considered as well an intuitive and convenient mean for the communication between human and machines. This paper presents a novel real-time static and dynamic human gesture capturing and recognition method designed using flexible wearable data band/data glove, which are designed based on three/ten stretchable strain sensors respectively. The wearable data bands and data gloves are worn by human arm/hand to ensure that the sensors are accurately attached to the human joints for accurate measurement of joints movements of shoulders joints, elbows joints, wrists joints, metacarpal and proximal joints of fingers. In this work, a new idea of real-time static and dynamic human gesture capturing and recognition is introduced and developed based on the radial basis function neural network (RBFNN). Dynamic time warping (DTW) is used to select dynamic behavior candidates and also to recognize gestures by comparing the observed records with a series of pre-recorded reference data patterns. The solution deals simultaneously with static and dynamic gestures as well as with multiple joints within the interest space. The experimental results of human arm/hand static gestures and dynamic gestures capturing and recognition verify the effectiveness of the proposed methods.

Journal ArticleDOI
TL;DR: The benchmark study presented can be a useful reference for the research community on its own; and the dataset-level assessment metrics reported may be used for designing evaluation frameworks to answer different research questions.
Abstract: This paper presents the first time series clustering benchmark utilizing all time series datasets currently available in the University of California Riverside (UCR) archive -- the state of the art repository of time series data. Specifically, the benchmark examines eight popular clustering methods representing three categories of clustering algorithms (partitional, hierarchical and density-based) and three types of distance measures (Euclidean, dynamic time warping, and shape-based). We lay out six restrictions with special attention to making the benchmark as unbiased as possible. A phased evaluation approach was then designed for summarizing dataset-level assessment metrics and discussing the results. The benchmark study presented can be a useful reference for the research community on its own; and the dataset-level assessment metrics reported may be used for designing evaluation frameworks to answer different research questions.

Journal ArticleDOI
TL;DR: This work clustered raw LIDAR data and classified the clusters into human and nonhuman classes in order to recognize humans in a scenario and built two neural networks, including a long short-term memory network and a temporal convolutional network (TCN) to classify trajectory samples into 15 activity classes collected from a kitchen.
Abstract: Motion trajectories contain rich information about human activities. We propose to use a 2-D LIDAR to perform multiple people activity recognition simultaneously by classifying their trajectories. We clustered raw LIDAR data and classified the clusters into human and nonhuman classes in order to recognize humans in a scenario. For the clusters of humans, we implemented the Kalman filter to track their trajectories which are further segmented and labeled with corresponding activities. We introduced spatial transformation and Gaussian noise for trajectory augmentation in order to overcome the problem of unbalanced classes and boost the performance of human activity recognition (HAR). Finally, we built two neural networks, including a long short-term memory (LSTM) network and a temporal convolutional network (TCN) to classify trajectory samples into 15 activity classes collected from a kitchen. The proposed TCN achieved the best result of 99.49% in overall accuracy. In comparison, the TCN is slightly superior to the LSTM network. Both the TCN and the LSTM network outperform the hidden Markov model (HMM), dynamic time warping (DTW), and support vector machine (SVM) with a wide margin. Our approach achieves a higher activity recognition accuracy than the related work.

Journal ArticleDOI
TL;DR: Basic warping theory is covered, simulation examples, and practical experimental strategies are presented, and matlab code and simulated and experimental datasets are provided for easy implementation of warping on both impulsive and frequency-modulated signals from both biotic and man-made sources.
Abstract: Classical ocean acoustic experiments involve the use of synchronized arrays of sensors. However, the need to cover large areas and/or the use of small robotic platforms has evoked interest in single-hydrophone processing methods for localizing a source or characterizing the propagation environment. One such processing method is “warping,” a non-linear, physics-based signal processing tool dedicated to decomposing multipath features of low-frequency transient signals (frequency f 1 km). Since its introduction to the underwater acoustics community in 2010, warping has been adopted in the ocean acoustics literature, mostly as a pre-processing method for single receiver geoacoustic inversion. Warping also has potential applications in other specialties, including bioacoustics; however, the technique can be daunting to many potential users unfamiliar with its intricacies. Consequently, this tutorial article covers basic warping theory, presents simulation examples, and provides practical experimental strategies. Accompanying supplementary material provides matlab code and simulated and experimental datasets for easy implementation of warping on both impulsive and frequency-modulated signals from both biotic and man-made sources. This combined material should provide interested readers with user-friendly resources for implementing warping methods into their own research.

Journal ArticleDOI
TL;DR: An efficient road surface monitoring using an ultrasonic sensor and image processing technique and a new algorithm, HANUMAN, was proposed for automatic recognition and calculation of pothole and speed bumps.
Abstract: Road surface monitoring is an essential problem in providing smooth road infrastructure to commuters This paper proposed an efficient road surface monitoring using an ultrasonic sensor and image processing technique A novel cost-effective system, which includes ultrasonic sensors sensing with GPS for the detection of the road surface conditions, was designed and proposed Dynamic time warping (DTW) technique was incorporated with ultrasonic sensors to improve the classification and accuracy of road surface detecting conditions A new algorithm, HANUMAN, was proposed for automatic recognition and calculation of pothole and speed bumps Manual inspection was performed and comparison was undertaken to validate the results The proposed system showed better efficiency than the previous systems with a 9550% detection rate for various road surface irregularities The novel framework will not only identify the road irregularities, but also help in decreasing the number of accidents by alerting drivers

Journal ArticleDOI
TL;DR: A feature-weighted clustering method based on two distance measurement methods called dynamic time warping (DTW) and shape-based distance (SDB) can improve clustering accuracy for multivariate time series datasets.
Abstract: As an important set of techniques for data mining, time series clustering methods had been studied by many researchers. Although most existing solutions largely focus on univariate time series clustering, there has been a surge in interest in the clustering of multivariate time series data. In this paper, a feature-weighted clustering method is proposed based on two distance measurement methods called dynamic time warping (DTW) and shape-based distance (SDB). There are four stages in the proposed clustering algorithm. First, we pick cluster centers by the pop clustering method called clustering by fast search and find of density peaks (DPC). Next, by considering the overall matching of multivariate time series, a fuzzy membership matrix is generated by performing DTW on all variables. We then reconsider the contribution of each independent dimension by utilizing SBD to measure distances within each dimension and construct multiple fuzzy membership matrices. Finally, we utilize a traditional fuzzy clustering algorithm called fuzzy c-means to cluster the fuzzy membership matrices and generate clustering results. Simultaneously, a feature weight calculation method and novel equation for constructing fuzzy membership matrices are applied during the clustering process. We compare the proposed method to other clustering methods and the results indicate that the proposed method can improve clustering accuracy for multivariate time series datasets.

Journal ArticleDOI
TL;DR: A numerical analysis of the suitability of different non-parametric and parametric measures for sparsity characterization shows that kurtosis, the Gini index, and the parametric sparsity measures are advantageous sparsity Measures, whereas the $l_1$-norm and entropy measures fail to robustly characterize the temporal sparsity of signals with a different number of time frames.
Abstract: To assist the clinical diagnosis and treatment of neurological diseases that cause speech dysarthria such as Parkinson's disease (PD), it is of paramount importance to craft robust features which can be used to automatically discriminate between healthy and dysarthric speech. Since dysarthric speech of patients suffering from PD is breathy, semi-whispery, and is characterized by abnormal pauses and imprecise articulation, it can be expected that its spectro-temporal sparsity differs from the spectro-temporal sparsity of healthy speech. While we have recently successfully used temporal sparsity characterization for dysarthric speech detection, characterizing spectral sparsity poses the challenge of constructing a valid feature vector from signals with a different number of unaligned time frames. Further, although several non-parametric and parametric measures of sparsity exist, it is unknown which sparsity measure yields the best performance in the context of dysarthric speech detection. The objective of this paper is to demonstrate the advantages of spectro-temporal sparsity characterization for automatic dysarthric speech detection. To this end, we first provide a numerical analysis of the suitability of different non-parametric and parametric measures (i.e., $l_1$ -norm, kurtosis, Shannon entropy, Gini index, shape parameter of a Chi distribution, and shape parameter of a Weibull distribution) for sparsity characterization. It is shown that kurtosis, the Gini index, and the parametric sparsity measures are advantageous sparsity measures, whereas the $l_1$ -norm and entropy measures fail to robustly characterize the temporal sparsity of signals with a different number of time frames. Second, we propose to characterize the spectral sparsity of an utterance by initially time-aligning it to the same utterance uttered by a (arbitrarily selected) reference speaker using dynamic time warping. Experimental results on a Spanish database of healthy and dysarthric speech show that estimating the spectro-temporal sparsity using the Gini index or the parametric sparsity measures and using it as a feature in a support vector machine results in a high classification accuracy of 83.3%.

Journal ArticleDOI
TL;DR: This paper proposes a filter method to select a subset of time series using an adaptation of existing nonparametric mutual information estimators based on the k-nearest neighbor and relies on the use of dynamic time warping dissimilarity to bring these methods to the time series scenario.

Journal ArticleDOI
TL;DR: The results show that the spatial structure of Chengdu is complex and that the urban functions are interlaced, but there are still rules that are followed and traffic volume and inflow data can better reflect the travel rules of residents than simple taxi on–off data.
Abstract: Overall scientific planning of urbanization layout is an important component of the new period of land spatial planning policies. Defining the main functions of different spaces and dividing urban functional areas are of great significance for optimizing the land development pattern. This article identifies and analyses urban functional areas from the perspective of data mining. The results of this method are consistent with the actual situation. In this paper, representative taxi trajectory data are selected as the research basis of urban functional areas. First, based on trajectory data from Didi Chuxing within the high-speed road surrounding Chengdu, we generated trajectory time sequence data and used the dynamic time warping (DTW) algorithm to generate a time series similarity matrix. Second, we utilized the K-medoid clustering algorithm to generate preliminary results of land clustering and selected the results with high classification accuracy as the training samples. Then, the k-nearest neighbour (KNN) classification algorithm based on DTW was performed to classify and identify the urban functional areas. Finally, with the help of point-of-interest (POI) auxiliary analysis, the final functional layout in Chengdu was obtained. The results show that the spatial structure of Chengdu is complex and that the urban functions are interlaced, but there are still rules that are followed. Moreover, traffic volume and inflow data can better reflect the travel rules of residents than simple taxi on–off data. The original DTW calculation method has high temporal complexity, which can be improved by normalization and the reduction of time series dimensionality. The semi-supervised learning classification method is also applicable to trajectory data, and it is best to select training samples from unsupervised learning. This method can provide a theoretical basis for urban land planning and has auxiliary and guiding value for urbanization layout in the context of land spatial planning policies in the new era.

Journal ArticleDOI
TL;DR: This paper aims to cover the most recent approaches in Chinese Sign Language Recognition (CSLR) with a thorough review of superior methods from 2000 to 2019 in CSLR researches, and methods of classification and feature extraction, accuracy/performance evaluation, and sample size/datasets were compared.
Abstract: Chinese Sign Language (CSL) offers the main means of communication for the hearing impaired in China. Sign Language Recognition (SLR) can shorten the distance between the hearing-impaired and healthy people and help them integrate into the society. Therefore, SLR has become the focus of sign language application research. Over the years, the continuous development of new technologies provides a source and motivation for SLR. This paper aims to cover the most recent approaches in Chinese Sign Language Recognition (CSLR). With a thorough review of superior methods from 2000 to 2019 in CSLR researches, various techniques and algorithms such as scale-invariant feature transform, histogram of oriented gradients, wavelet entropy, Hu moment invariant, Fourier descriptor, gray-level co-occurrence matrix, dynamic time warping, principal component analysis, autoencoder, hidden Markov model (HMM), support vector machine (SVM), random forest, skin color modeling method, k-NN, artificial neural network, convolutional neural network (CNN), and transfer learning are discussed in detail, which are based on several major stages, that is, data acquisition, preprocessing, feature extraction, and classification. CSLR was summarized from some aspect as follows: methods of classification and feature extraction, accuracy/performance evaluation, and sample size/datasets. The advantages and limitations of different CSLR approaches were compared. It was found that data acquisition is mainly through Kinect and camera, and the feature extraction focuses on hand’s shape and spatiotemporal factors, but ignoring facial expressions. HMM and SVM are used most in the classification. CNN is becoming more and more popular, and a deep neural network-based recognition approach will be the future trend. However, due to the complexity of the contemporary Chinese language, CSLR generally has a lower accuracy than other SLR. It is necessary to establish an appropriate dataset to conduct comparable experiments. The issue of decreasing accuracy as the dataset increases needs to resolve. Overall, our study is hoped to give a comprehensive presentation for those people who are interested in CSLR and SLR and to further contribute to the future research.

Journal ArticleDOI
TL;DR: Numerical examples are given to show that the proposed DTW-based data augmentation can predict the RUL with less uncertainty than the conventional neural network model without data mapping.