scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic time warping published in 2014"


Proceedings ArticleDOI
23 Jun 2014
TL;DR: A new skeletal representation that explicitly models the 3D geometric relationships between various body parts using rotations and translations in 3D space is proposed and outperforms various state-of-the-art skeleton-based human action recognition approaches.
Abstract: Recently introduced cost-effective depth sensors coupled with the real-time skeleton estimation algorithm of Shotton et al. [16] have generated a renewed interest in skeleton-based human action recognition. Most of the existing skeleton-based approaches use either the joint locations or the joint angles to represent a human skeleton. In this paper, we propose a new skeletal representation that explicitly models the 3D geometric relationships between various body parts using rotations and translations in 3D space. Since 3D rigid body motions are members of the special Euclidean group SE(3), the proposed skeletal representation lies in the Lie group SE(3)×…×SE(3), which is a curved manifold. Using the proposed representation, human actions can be modeled as curves in this Lie group. Since classification of curves in this Lie group is not an easy task, we map the action curves from the Lie group to its Lie algebra, which is a vector space. We then perform classification using a combination of dynamic time warping, Fourier temporal pyramid representation and linear SVM. Experimental results on three action datasets show that the proposed representation performs better than many existing skeletal representations. The proposed approach also outperforms various state-of-the-art skeleton-based human action recognition approaches.

1,432 citations


Book ChapterDOI
06 Sep 2014
TL;DR: A novel model to automatically select the most discriminative video fragments from noisy image sequences of people where more reliable space-time features can be extracted, whilst simultaneously to learn a video ranking function for person re-id is presented.
Abstract: Current person re-identification (re-id) methods typically rely on single-frame imagery features, and ignore space-time information from image sequences. Single-frame (single-shot) visual appearance matching is inherently limited for person re-id in public spaces due to visual ambiguity arising from non-overlapping camera views where viewpoint and lighting changes can cause significant appearance variation. In this work, we present a novel model to automatically select the most discriminative video fragments from noisy image sequences of people where more reliable space-time features can be extracted, whilst simultaneously to learn a video ranking function for person re-id. Also, we introduce a new image sequence re-id dataset (iLIDS-VID) based on the i-LIDS MCT benchmark data. Using the iLIDS-VID and PRID 2011 sequence re-id datasets, we extensively conducted comparative evaluations to demonstrate the advantages of the proposed model over contemporary gait recognition, holistic image sequence matching and state-of-the-art single-shot/multi-shot based re-id methods.

600 citations


Book ChapterDOI
16 Jun 2014
TL;DR: A novel deep learning framework for multivariate time series classification is proposed that is not only more efficient than the state of the art but also competitive in accuracy and demonstrates that feature learning is worth to investigate for time series Classification.
Abstract: Time series (particularly multivariate) classification has drawn a lot of attention in the literature because of its broad applications for different domains, such as health informatics and bioinformatics. Thus, many algorithms have been developed for this task. Among them, nearest neighbor classification (particularly 1-NN) combined with Dynamic Time Warping (DTW) achieves the state of the art performance. However, when data set grows larger, the time consumption of 1-NN with DTW grows linearly. Compared to 1-NN with DTW, the traditional feature-based classification methods are usually more efficient but less effective since their performance is usually dependent on the quality of hand-crafted features. To that end, in this paper, we explore the feature learning techniques to improve the performance of traditional feature-based approaches. Specifically, we propose a novel deep learning framework for multivariate time series classification. We conduct two groups of experiments on real-world data sets from different application domains. The final results show that our model is not only more efficient than the state of the art but also competitive in accuracy. It also demonstrates that feature learning is worth to investigate for time series classification.

534 citations


Journal ArticleDOI
18 Jun 2014-Sensors
TL;DR: An automated fall detection system with wearable motion sensor units fitted to the subjects' body at six different positions is developed and successfully distinguish falls from ADLs using six machine learning techniques (classifiers): the k-nearest neighbor (k-NN) classifier, least squares method (LSM), support vector machines (SVM), Bayesian decision making (BDM), dynamic time warping (DTW), and artificial neural networks (ANNs).
Abstract: Falls are a serious public health problem and possibly life threatening for people in fall risk groups. We develop an automated fall detection system with wearable motion sensor units fitted to the subjects' body at six different positions. Each unit comprises three tri-axial devices (accelerometer, gyroscope, and magnetometer/compass). Fourteen volunteers perform a standardized set of movements including 20 voluntary falls and 16 activities of daily living (ADLs), resulting in a large dataset with 2520 trials. To reduce the computational complexity of training and testing the classifiers, we focus on the raw data for each sensor in a 4 s time window around the point of peak total acceleration of the waist sensor, and then perform feature extraction and reduction. Most earlier studies on fall detection employ rule-based approaches that rely on simple thresholding of the sensor outputs. We successfully distinguish falls from ADLs using six machine learning techniques (classifiers): the k-nearest neighbor (k-NN) classifier, least squares method (LSM), support vector machines (SVM), Bayesian decision making (BDM), dynamic time warping (DTW), and artificial neural networks (ANNs). We compare the performance and the computational complexity of the classifiers and achieve the best results with the k-NN classifier and LSM, with sensitivity, specificity, and accuracy all above 99%. These classifiers also have acceptable computational requirements for training and testing. Our approach would be applicable in real-world scenarios where data records of indeterminate length, containing multiple activities in sequence, are recorded.

306 citations


Journal ArticleDOI
TL;DR: In this paper, a highly comparative feature-based approach to time series classification is introduced that uses an extensive database of algorithms to extract thousands of interpretable features from time series and selects those features that are most informative of the class structure using greedy forward feature selection with a linear classifier.
Abstract: A highly comparative, feature-based approach to time series classification is introduced that uses an extensive database of algorithms to extract thousands of interpretable features from time series. These features are derived from across the scientific time-series analysis literature, and include summaries of time series in terms of their correlation structure, distribution, entropy, stationarity, scaling properties, and fits to a range of time-series models. After computing thousands of features for each time series in a training set, those that are most informative of the class structure are selected using greedy forward feature selection with a linear classifier. The resulting feature-based classifiers automatically learn the differences between classes using a reduced number of time-series properties, and circumvent the need to calculate distances between time series. Representing time series in this way results in orders of magnitude of dimensionality reduction, allowing the method to perform well on very large datasets containing long time series or time series of different lengths. For many of the datasets studied, classification performance exceeded that of conventional instance-based classifiers, including one nearest neighbor classifiers using Euclidean distances and dynamic time warping and, most importantly, the features selected provide an understanding of the properties of the dataset, insight that can guide further scientific investigation.

237 citations


Journal ArticleDOI
TL;DR: A highly comparative, feature-based approach to time series classification is introduced that uses an extensive database of algorithms to extract thousands of interpretable features from time series, allowing the method to perform well on very large data sets containing long time series or time series of different lengths.
Abstract: A highly comparative, feature-based approach to time series classification is introduced that uses an extensive database of algorithms to extract thousands of interpretable features from time series. These features are derived from across the scientific time-series analysis literature, and include summaries of time series in terms of their correlation structure, distribution, entropy, stationarity, scaling properties, and fits to a range of time-series models. After computing thousands of features for each time series in a training set, those that are most informative of the class structure are selected using greedy forward feature selection with a linear classifier. The resulting feature-based classifiers automatically learn the differences between classes using a reduced number of time-series properties, and circumvent the need to calculate distances between time series. Representing time series in this way results in orders of magnitude of dimensionality reduction, allowing the method to perform well on very large data sets containing long time series or time series of different lengths. For many of the data sets studied, classification performance exceeded that of conventional instance-based classifiers, including one nearest neighbor classifiers using euclidean distances and dynamic time warping and, most importantly, the features selected provide an understanding of the properties of the data set, insight that can guide further scientific investigation.

214 citations


Proceedings ArticleDOI
14 Dec 2014
TL;DR: It is shown that a recent result can be exploited to allow meaningful averaging of 'warped' times series, and that this result allows us to create ultra-efficient Nearest 'Centroid' classifiers that are at least as accurate as their more lethargic Nearest Neighbor cousins.
Abstract: Recent years have seen significant progress in improving both the efficiency and effectiveness of time series classification. However, because the best solution is typically the Nearest Neighbor algorithm with the relatively expensive Dynamic Time Warping as the distance measure, successful deployments on resource constrained devices remain elusive. Moreover, the recent explosion of interest in wearable devices, which typically have limited computational resources, has created a growing need for very efficient classification algorithms. A commonly used technique to glean the benefits of the Nearest Neighbor algorithm, without inheriting its undesirable time complexity, is to use the Nearest Centroid algorithm. However, because of the unique properties of (most) time series data, the centroid typically does not resemble any of the instances, an unintuitive and underappreciated fact. In this work we show that we can exploit a recent result to allow meaningful averaging of 'warped' times series, and that this result allows us to create ultra-efficient Nearest 'Centroid' classifiers that are at least as accurate as their more lethargic Nearest Neighbor cousins.

205 citations


Journal ArticleDOI
TL;DR: This paper presents an approach to image time series analysis which is able to deal with irregularly sampled series and which also allows the comparison of pairs of time series where each element of the pair has a different number of samples.
Abstract: Earth observation satellites are now providing images with short revisit cycle and high spatial resolution. The amount of produced data requires new methods that will give a sound temporal analysis while being computationally efficient. Dynamic time warping has proved to be a very sound measure to capture similarities in radiometric evolutions. In this letter, we show that its nonlinear distortion behavior is compatible with the use of a spatiotemporal segmentation of the data cube that is formed by a satellite image time series (SITS). While dealing with spatial and temporal dimensions of SITS at the same time had already proven to be very challenging, this letter proves that, by taking advantage of the spatial and temporal connectivities, both the performance and the quality of the analysis can be improved. Our method is assessed on a SITS of 46 Formosat -2 images sensed in 2006, with an average cloud cover of one third. We show that our approach induces the following: 1) sharply reduced memory usage; 2) improved classification results; and 3) shorter running time.

179 citations


Journal ArticleDOI
TL;DR: A novel tactile-array sensor based on flexible piezoresistive rubber based on a k nearest neighbor classifier and using dynamic time warping to calculate the distance between different time series is presented.

171 citations


Journal ArticleDOI
TL;DR: This paper presents a novel framework for recognizing streamed actions using Motion Capture (MoCap) data based on histograms of action poses, extracted from MoCap data, that are computed according to Hausdorff distance.

135 citations


Journal ArticleDOI
TL;DR: In this article, a new fault detection method combines the fast dynamic time warping (Fast DTW) as well as the correlated kurtosis (CK) techniques to characterize the local gear fault, and identify the corresponding faulty gear and its position.

Proceedings ArticleDOI
01 Nov 2014
TL;DR: Experimental results show that both proposed methods outperform the HMM-based approach and are capable of disaggregating a range of domestic loads even when the training period is very short.
Abstract: We propose two algorithms for power load disaggregation at low-sampling rates (greater than 1sec): a low-complexity, supervised approach based on Decision Trees and an unsupervised method based on Dynamic Time Warping. Both proposed algorithms share common pre-classification steps. We provide reproducible algorithmic description and benchmark the proposed methods with a state-of-the-art Hidden Markov Model (HMM)-based approach. Experimental results using three US and three UK households, show that both proposed methods outperform the HMM-based approach and are capable of disaggregating a range of domestic loads even when the training period is very short.

Journal ArticleDOI
TL;DR: A template-based recognition method that simultaneously aligns the input gesture to the templates using a Sequential Monte Carlo inference technique, which continuously updates, during execution of the gesture, the estimated parameters and recognition results, which offers key advantages for continuous human--machine interaction.
Abstract: This article presents a gesture recognition/adaptation system for human--computer interaction applications that goes beyond activity classification and that, as a complement to gesture labeling, characterizes the movement execution. We describe a template-based recognition method that simultaneously aligns the input gesture to the templates using a Sequential Monte Carlo inference technique. Contrary to standard template-based methods based on dynamic programming, such as Dynamic Time Warping, the algorithm has an adaptation process that tracks gesture variation in real time. The method continuously updates, during execution of the gesture, the estimated parameters and recognition results, which offers key advantages for continuous human--machine interaction. The technique is evaluated in several different ways: Recognition and early recognition are evaluated on 2D onscreen pen gestures; adaptation is assessed on synthetic data; and both early recognition and adaptation are evaluated in a user study involving 3D free-space gestures. The method is robust to noise, and successfully adapts to parameter variation. Moreover, it performs recognition as well as or better than nonadapting offline template-based methods.

Proceedings ArticleDOI
06 Nov 2014
TL;DR: A place recognition algorithm which operates by matching local query image sequences to a database of image sequences using a Hidden Markov Model (HMM) framework reminiscent of Dynamic Time Warping from speech recognition is presented.
Abstract: Visual place recognition and loop closure is critical for the global accuracy of visual Simultaneous Localization and Mapping (SLAM) systems. We present a place recognition algorithm which operates by matching local query image sequences to a database of image sequences. To match sequences, we calculate a matrix of low-resolution, contrast-enhanced image similarity probability values. The optimal sequence alignment, which can be viewed as a discontinuous path through the matrix, is found using a Hidden Markov Model (HMM) framework reminiscent of Dynamic Time Warping from speech recognition. The state transitions enforce local velocity constraints and the most likely path sequence is recovered efficiently using the Viterbi algorithm. A rank reduction on the similarity probability matrix is used to provide additional robustness in challenging conditions when scoring sequence matches. We evaluate our approach on seven outdoor vision datasets and show improved precision-recall performance against the recently published seqSLAM algorithm.

Book ChapterDOI
TL;DR: In this paper, the Quadratic-Chi distance family is used to measure differences between histograms to capture cross-bin relationships and a new algorithm for trimming videos is proposed to remove all the unimportant frames from videos.
Abstract: The purpose of this paper is to describe one-shot-learning gesture recognition systems developed on the ChaLearn Gesture Dataset (ChaLearn). We use RGB and depth images and combine appearance (Histograms of Oriented Gradients) and motion descriptors (Histogram of Optical Flow) for parallel temporal segmentation and recognition. The Quadratic-Chi distance family is used to measure differences between histograms to capture cross-bin relationships. We also propose a new algorithm for trimming videos--to remove all the unimportant frames from videos. We present two methods that use a combination of HOG-HOF descriptors together with variants of a Dynamic Time Warping technique. Both methods outperform other published methods and help narrow the gap between human performance and algorithms on this task. The code is publicly available in the MLOSS repository.

Proceedings ArticleDOI
04 May 2014
TL;DR: Experimental results show that remarkable performance improvements can be achieved by using multiple examples per query and through the late (score-level) fusion of different subsystems, each based on a different set of phone posteriors.
Abstract: In the last years, the task of Query-by-Example Spoken Term Detection (QbE-STD), which aims to find occurrences of a spoken query in a set of audio documents, has gained the interest of the research community for its versatility in settings where untranscribed, multilingual and acoustically unconstrained spoken resources, or spoken resources in low-resource languages, must be searched. This paper describes and reports experimental results for a QbE-STD system that achieved the best performance in the recent Spoken Web Search (SWS) evaluation, held as part of MediaEval 2013. Though not optimized for speed, the system operates faster than real-time. The system exploits high-performance phone decoders to extract frame-level phone posteriors (a common representation in QbE-STD tasks). Then, given a query and a audio document, a distance matrix is computed between their phone posterior representations, followed by a newly introduced distance normalization technique and an iterative Dynamic Time Warping (DTW) matching procedure with some heuristic prunings. Results show that remarkable performance improvements can be achieved by using multiple examples per query and, specially, through the late (score-level) fusion of different subsystems, each based on a different set of phone posteriors.

Journal ArticleDOI
TL;DR: A new distance function based on derivatives and transforms of time series based on DTW distance between time series provides a significantly more accurate classification on the examined data sets.
Abstract: Over recent years the popularity of time series has soared. As a consequence there has been a dramatic increase in the amount of interest in querying and mining such data. In particular, many new distance measures between time series have been introduced. In this paper we propose a new distance function based on derivatives and transforms of time series. In contrast to well-known measures from the literature, our approach combines three distances: DTW distance between time series, DTW distance between derivatives of time series, and DTW distance between transforms of time series. The new distance is used in classification with the nearest neighbor rule. In order to provide a comprehensive comparison, we conducted a set of experiments, testing effectiveness on 47 time series data sets from a wide variety of application domains. Our experiments show that this new method provides a significantly more accurate classification on the examined data sets.

Book ChapterDOI
01 Nov 2014
TL;DR: An algorithm is introduced that reduces pose data over time to histograms of relative location, velocity, and their correlations and use partial least squares to learn a compact and discriminative representation from it, which achieves state-of-the-art accuracy on four different benchmarks.
Abstract: Action recognition from 3d pose data has gained increasing attention since the data is readily available for depth or RGB-D videos. The most successful approaches so far perform an expensive feature selection or mining approach for training. In this work, we introduce an algorithm that is very efficient for training and testing. The main idea is that rich structured data like 3d pose does not require sophisticated feature modeling or learning. Instead, we reduce pose data over time to histograms of relative location, velocity, and their correlations and use partial least squares to learn a compact and discriminative representation from it. Despite of its efficiency, our approach achieves state-of-the-art accuracy on four different benchmarks. We further investigate differences of 2d and 3d pose data for action recognition.

Journal ArticleDOI
TL;DR: In this article, highperformance thin-layer chromatography (HPTLC) combined with image analysis and pattern recognition methods were used for fingerprinting and classification of 52 propolis samples collected from Serbia and one sample from Croatia.
Abstract: High-performance thin-layer chromatography (HPTLC) combined with image analysis and pattern recognition methods were used for fingerprinting and classification of 52 propolis samples collected from Serbia and one sample from Croatia. Modern thin-layer chromatography equipment in combination with software for image processing and warping was applied for fingerprinting and data acquisition. The three mostly used chemometric techniques for classification, principal component analysis, cluster analysis and partial least square-discriminant analysis, in combination with simple and fast HPTLC method for fingerprint analysis of propolis, were performed in order to favor and encourage their use in planar chromatography. HPTLC fingerprint analysis of propolis was for the first time performed on amino silica plates. All studied propolis samples have been classified in two major types, orange and blue, supporting the idea of existence of two types of European propolis. Signals at specific RF values responsible for classification of studied extracts have also been isolated and underlying compounds targeted for further investigation. Copyright © 2014 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: A weighted DTW method is proposed that weights joints by optimizing a discriminant ratio to make the gesture recognition mechanism robust to variations due to different camera or body orientations or to different skeleton sizes between the reference gesture sequences and the test gesture sequences.
Abstract: Gesture recognition is a technology often used in human-computer interaction applications. Dynamic time warping (DTW) is one of the techniques used in gesture recognition to find an optimal alignment between two sequences. Oftentimes a pre-processing of sequences is required to remove variations due to different camera or body orientations or due to different skeleton sizes between the reference gesture sequences and the test gesture sequences. We discuss a set of pre-processing methods to make the gesture recognition mechanism robust to these variations. DTW computes a dissimilarity measure by time-warping the sequences on a per sample basis by using the distance between the current reference and test sequences. However, all body joints involved in a gesture are not equally important in computing the distance between two sequence samples. We propose a weighted DTW method that weights joints by optimizing a discriminant ratio. Finally, we demonstrate the performance of our pre-processing and the weighted DTW method and compare our results with the conventional DTW and state-of-the-art.

Proceedings ArticleDOI
19 Jun 2014
TL;DR: In this paper, an implementation of speech recognition system in MATLAB environment is explained, where two algorithms, Mel-Frequency Cepstral Coefficients (MFCC) and Dynamic Time Wrapping (DTW) are adapted for feature extraction and pattern matching respectively.
Abstract: Speech recognition has wide range of applications in security systems, healthcare, telephony military, and equipment designed for handicapped. Speech is continuous varying signal. So, proper digital processing algorithm has to be selected for automatic speech recognition system. To obtain required information from the speech sample, features have to be extracted from it. For recognition purpose the feature are analyzed to make decisions. In this paper implementation of Speech recognition system in MATLAB environment is explained. Mel-Frequency Cepstral Coefficients (MFCC) and Dynamic Time Wrapping (DTW) are two algorithms adapted for feature extraction and pattern matching respectively. Results are obtained by one time training and continuous testing phases.

Journal ArticleDOI
TL;DR: It is shown that inertial data sampled at 100 Hz and vision data at 5 frames/s could be fused by an extended Kalman filter, and used for accurate human hand gesture recognition and tracking, and a novel adaptive algorithm has been developed to adjust measurement noise covariance according to the measured accelerations and the angular rotation rates.
Abstract: In this paper, we present an algorithm for hand gesture tracking and recognition based on the integration of a custom-built microelectromechanical systems (MEMS)-based inertial sensor (or measurement unit) and a low resolution imaging (i.e., vision) sensor. We discuss the 2-D gesture recognition and tracking results here, but the algorithm can be extended to 3-D motion tracking and gesture recognition in the future. Essentially, this paper shows that inertial data sampled at 100 Hz and vision data at 5 frames/s could be fused by an extended Kalman filter, and used for accurate human hand gesture recognition and tracking. Since an inertial sensor is better at tracking rapid movements, while a vision sensor is more stable and accurate for tracking slow movements, a novel adaptive algorithm has been developed to adjust measurement noise covariance according to the measured accelerations and the angular rotation rates. The experimental results verify that the proposed method is capable of reducing the velocity error and position drift in an MEMS-based inertial sensor when aided by the vision sensor. Compensating for the time delay due to the visual data processing cycles, a moving average filter is applied to remove the high frequency noise and propagate the inertial signals. The reconstructed trajectories of the first 10 Arabic numerals are further recognized using dynamic time warping with a direct cosine transform for feature extraction, resulting in an accuracy of 92.3% and individual numeral recognition within 100 ms.

Journal ArticleDOI
TL;DR: The proposed DTW variant uses samples of the same gesture category to build a Gaussian Mixture Model driven probabilistic model of that gesture class, and shows better performance in comparison to both standard BoVW model and DTW approach.

Journal ArticleDOI
TL;DR: The system demonstration led to promising results with respect to the accuracy of waste level estimation (98.50%) and the application can be used to optimize the routing of waste collection based on the estimated bin level.

Journal ArticleDOI
TL;DR: The aim of this study was to develop an autonomous and intelligent system for residential water end-use classification that could interface with customers and water business managers via a user-friendly web-based application.
Abstract: Intelligent metering technology combined with advanced numerical techniques enable a paradigm shift in the current level of water consumption information provision that is available to the customer and the water business. The aim of this study was to develop an autonomous and intelligent system for residential water end-use classification that could interface with customers and water business managers via a user-friendly web-based application. Water flow data collected directly from smart water meters includes both single (e.g., a shower event occurring alone) and combined (i.e., an event that comprises several overlapping single events) water end use events. The authors recently developed intelligent algorithms to solve the complex problem of autonomously categorising residential water consumption data into a registry of single and combined events using a hybrid combination of techniques including Hidden Markov Model (HMM), Dynamic Time Warping (DTW) algorithm, time-of-day probability functions, threshold values and various physical features. However, the issue still remained, which is the focus of this current paper, on how to integrate self-learning functionality into the visioned expert system, in order that it can learn from newly collected datasets from different cities, regions and countries, to that collected for the training data. Such versatility and adaptive capacity is essential to make the expert system widely applicable. Through applying alternate forms of HMM and DTW in association with a frequency analysis technique, a suitable self-learning methodology was formulated and tested on three independent households located in Melbourne, Australia with a prediction accuracy of between 80% and 90% for the major end-use categories. The three principle flow data processing modules (i.e., single and combined event recognition and self-learning function) were integrated into a prototype software application for performing autonomous water end-use analysis and its functionality is presented in the latter sections of this paper. The developed expert system has profound implications for government, water businesses and consumers, seeking to better manage precious urban water resources.

Journal ArticleDOI
TL;DR: A variant of DTW based algorithm referred to as non-segmental DTW (NS-DTW) is used, with a computational upper bound of O (mn) and analyzed the performance of QbE-STD with Gaussian posteriorgrams obtained from spectral and temporal features of the speech signal, showing that frequency domain linear prediction cepstral coefficients can be used as an alternative to traditional spectral parameters.
Abstract: The task of query-by-example spoken term detection (QbE-STD) is to find a spoken query within spoken audio data. Current state-of-the-art techniques assume zero prior knowledge about the language of the audio data, and thus explore dynamic time warping (DTW) based techniques for the QbE-STD task. In this paper, we use a variant of DTW based algorithm referred to as non-segmental DTW (NS-DTW), with a computational upper bound of O (mn) and analyze the performance of QbE-STD with Gaussian posteriorgrams obtained from spectral and temporal features of the speech signal. The results show that frequency domain linear prediction cepstral coefficients, which capture the temporal dynamics of the speech signal, can be used as an alternative to traditional spectral parameters such as linear prediction cepstral coefficients, perceptual linear prediction cepstral coefficients and Mel-frequency cepstral coefficients. We also introduce another variant of NS-DTW called fast NS-DTW (FNS-DTW) which uses reduced feature vectors for search. With a reduction factor of α ∈ ℕ, we show that the computational upper bound for FNS-DTW is O(mn/(α2)) which is faster than NS-DTW.

Journal ArticleDOI
TL;DR: A novel, generative model that discovers temporal dependencies on the shared/individual spaces in Probabilistic Canonical Correlation Analysis (PCCA) and outperforms state-of-the-art methods for both the aggregation of multiple, yet imperfect expert annotations as well as the alignment of affective behavior.
Abstract: Fusing multiple continuous expert annotations is a crucial problem in machine learning and computer vision, particularly when dealing with uncertain and subjective tasks related to affective behavior. Inspired by the concept of inferring shared and individual latent spaces in Probabilistic Canonical Correlation Analysis (PCCA), we propose a novel, generative model that discovers temporal dependencies on the shared/individual spaces (Dynamic Probabilistic CCA, DPCCA). In order to accommodate for temporal lags, which are prominent amongst continuous annotations, we further introduce a latent warping process, leading to the DPCCA with Time Warpings (DPCTW) model. Finally, we propose two supervised variants of DPCCA/DPCTW which incorporate inputs (i.e., visual or audio features), both in a generative (SG-DPCCA) and discriminative manner (SD-DPCCA). We show that the resulting family of models (i) can be used as a unifying framework for solving the problems of temporal alignment and fusion of multiple annotations in time, (ii) can automatically rank and filter annotations based on latent posteriors or other model statistics, and (iii) that by incorporating dynamics, modeling annotation-specific biases, noise estimation, time warping and supervision, DPCTW outperforms state-of-the-art methods for both the aggregation of multiple, yet imperfect expert annotations as well as the alignment of affective behavior.

Journal ArticleDOI
TL;DR: This paper describes the implementation on field-programmable gate arrays (FPGAs) of an embedded system for online signature verification, which consists of a vector floating-point unit (VFPU), specifically designed for accelerating the floating- point computations involved in this biometric modality.
Abstract: This paper describes the implementation on field-programmable gate arrays (FPGAs) of an embedded system for online signature verification. The recognition algorithm mainly consists of three stages. First, an initial preprocessing is applied on the captured signature, removing noise and normalizing information related to horizontal and vertical positions. Afterwards, a dynamic time warping algorithm is used to align this processed signature with its template previously stored in a database. Finally, a set of features are extracted and passed through a Gaussian Mixture Model, which reveals the degree of similarity between both signatures. The algorithm was tested using a public database of 100 users, obtaining high recognition rates for both genuine and forgery signatures. The implemented system consists of a vector floating-point unit (VFPU), specifically designed for accelerating the floating-point computations involved in this biometric modality. Moreover, the proposed architecture also includes a microprocessor, which interacts with the VFPU, and executes by software the rest of the online signature verification process. The designed system is capable of finishing a complete verification in less than 68 ms with a clock rated at 40 MHz. Experimental results show that the number of clock cycles is accelerated by a factor of ×4.8 and ×11.1, when compared with systems based on ARM Cortex-A8 and when substituting the VFPU by the Floating-Point Unit provided by Xilinx, respectively.

Journal ArticleDOI
TL;DR: The multi-template multi-match dynamic time warping (MTMM-DTW) algorithm is proposed as a natural extension of DTW to detect multiple occurrences of more than one exercise type in the recording of a physical therapy session and for providing feedback to the patient.

Proceedings Article
08 Dec 2014
TL;DR: This paper proposes to learn a Mahalanobis distance to perform alignment of multivariate time series, and proposes to use this metric learning framework to perform feature selection and, from basic audio features, build a combination of these with better alignment performance.
Abstract: In this paper, we propose to learn a Mahalanobis distance to perform alignment of multivariate time series. The learning examples for this task are time series for which the true alignment is known. We cast the alignment problem as a structured prediction task, and propose realistic losses between alignments for which the optimization is tractable. We provide experiments on real data in the audio-to-audio context, where we show that the learning of a similarity measure leads to improvements in the performance of the alignment task. We also propose to use this metric learning framework to perform feature selection and, from basic audio features, build a combination of these with better alignment performance.