scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Classification of varying length time series using example-specific adapted Gaussian mixture models and support vector machines

TL;DR: A hybrid framework that first uses an adapted Gaussian mixture model based method to represent a varying length sequence of feature vectors as a fixed length pattern and then uses a discriminative model for classification of varying length patterns of long duration gives significantly improved classification performance compared to the conventional GMM based classifiers.
Abstract: In this paper, we propose a hybrid framework that first uses an adapted Gaussian mixture model based method to represent a varying length sequence of feature vectors as a fixed length pattern and then uses a discriminative model for classification of varying length patterns of long duration. In the conventional GMM-UBM (Gaussian mixture model-Universal background model) based classifier, a UBM is built using feature vectors of all classes. In the proposed approach, a GMM is built for each class using the feature vectors of all the patterns of that class. Then an adapted GMM is built for each example in the training data set using the GMM built for the class to which the example belongs to. The log-likelihood of a pattern for a given example-specific adapted GMM model is used as a score. A similarity based score vector is obtained by applying a pattern to the adapted GMMs of the patterns in the training set. A test pattern is also represented using a score vector. Support vector machine is then used for classification of score vector representation of varying length patterns. Our studies on speech emotion recognition and audio clip classification tasks show that the proposed method gives a significantly improved classification performance compared to the conventional GMM based classifiers.
Citations
More filters
Journal ArticleDOI
TL;DR: This paper proposes, implement, and test a hybrid neighborhood-aware algorithm for outlier detection that considers the uneven spatial density of the users, the number of malicious Users, the level of conspiracy, and the lack of accuracy and malfunctioning sensors.
Abstract: In this paper we study the problem of sensor data verification in Participatory Sensing (PS) systems using an air quality/pollution monitoring application as a validation example. Data verification, in the context of PS, consists of the process of detecting and removing spatial outliers to properly reconstruct the variables of interest. We propose, implement, and test a hybrid neighborhood-aware algorithm for outlier detection that considers the uneven spatial density of the users, the number of malicious users, the level of conspiracy, and the lack of accuracy and malfunctioning sensors. The algorithm utilizes the Delaunay triangulation and Gaussian Mixture Models to build neighborhoods based on the spatial and non-spatial attributes of each location. This neighborhood definition allows us to demonstrate that it is not necessary to apply accurate but computationally expensive estimators to the entire dataset to obtain good results, as equally accurate but computationally cheaper methods can also be applied to part of the data and obtain good results as well. Our experimental results show that our hybrid algorithm performs as good as the best estimator while reducing the execution time considerably.

22 citations

Proceedings ArticleDOI
02 Jul 2012
TL;DR: A hybrid neighborhood-aware algorithm for outlier detection that considers the uneven spatial density of the users, the number of malicious users,The level of conspiracy, and the lack of accuracy and malfunctioning sensors to build neighborhoods based on the spatial and non-spatial attributes of each location.
Abstract: In this paper we study the problem of sensor data verification in Participatory Sensing (PS) systems using an air quality/pollution monitoring application as a validation example. Data verification, in the context of PS, consists of the process of removing spatial outliers to properly reconstruct the variables of interest. We propose a hybrid neighborhood-aware algorithm for outlier detection that considers the uneven spatial density of the users, the number of malicious users, the level of conspiracy, and the lack of accuracy and malfunctioning sensors. The algorithm utilizes the Delaunay triangulation and Gaussian Mixture Models to build neighborhoods based on the spatial and non-spatial attributes of each location. Our experimental results show that our hybrid algorithm performs as good as the best estimator while considerably reducing the execution time.

7 citations


Cites background from "Classification of varying length ti..."

  • ...2) Gaussian Mixture Models: GMM have been identified as an important promising technique for pattern recognition [11]....

    [...]

01 Jan 2012
TL;DR: A framework to guide the design and implementation of PS applications considering all these aspects is presented, and a new technique to interpolate data in time and space is proposed, which is more appropriate for PS systems.
Abstract: Participatory sensing (PS) systems are a new emerging sensing paradigm based on the participation of cellular users in a cooperative way. Due to the spatio-temporal granularity that a PS system can provide, it is now possible to detect and analyze events that occur at different scales, at a low cost. While PS systems present interesting characteristics, they also create new problems. Since the measuring devices are cheaper and they are in the hands of the users, PS systems face several design challenges related to the poor accuracy and high failure rate of the sensors, the possibility of malicious users tampering the data, the violation of the privacy of the users as well as methods to encourage the participation of the users, and the effective visualization of the data. This dissertation presents four main contributions in order to solve some of these challenges. This dissertation presents a framework to guide the design and implementation of PS applications considering all these aspects. The framework consists of five modules: sample size determination, data collection, data verification, data visualization, and density maps generation modules. The remaining contributions are mapped one-on-one to three of the modules of this framework: data verification, data visualization and density maps. Data verification, in the context of PS, consists of the process of detecting and removing spatial outliers to properly reconstruct the variables of interest. A new algorithm for spatial outliers detection and removal is proposed, implemented, and tested. This hybrid neighborhood-aware algorithm considers the uneven spatial density of the users, the number of malicious users, the level of conspiracy, and the lack of accuracy and malfunctioning sensors. The experimental results show that the proposed algorithm performs as good as the best estimator while reducing the execution time considerably. The problem of data visualization in the context of PS application is also of special interest. The characteristics of a typical PS application imply the generation of multivariate time-space series with many gaps in time and space. Considering this, a new method is presented based on the kriging technique along with Principal Component Analysis and Independent Component Analysis. Additionally, a new technique to interpolate data in time and space is proposed, which is more appropriate for PS systems. The results indicate that the accuracy of the estimates improves with the amount of data, i.e., one variable, multiple variables, and space and time data. Also, the results clearly show the advantage of a PS system compared with a traditional measuring system in terms of the precision and spatial resolution of the information provided to the users. One key challenge in PS systems is that of the determination of the locations and number of users where to obtain samples from so that the variables of interest can be accurately represented with a low number of participants. To address this challenge, the use of density maps is proposed, a technique that is based on the current estimations of the variable. The density maps are then utilized by the incentive mechanism in order to encourage the participation of those users indicated in the map. The experimental results show how the density maps greatly improve the quality of the estimations while maintaining a stable and low total number of users in the system. P-Sense, a PS system to monitor pollution levels, has been implemented and tested, and is used as a validation example for all the contributions presented here. P-Sense integrates gas and environmental sensors with a cell phone, in order to monitor air quality levels.

4 citations


Cites background from "Classification of varying length ti..."

  • ...GMM have been identified as an important promising machine learning technique for pattern recognition [39]....

    [...]

Proceedings ArticleDOI
23 May 2022
TL;DR: In this article , the authors proposed a novel method of normalizing the lengths of the time series in a dataset by exploiting the dynamic matching ability of Dynamic Time Warping (DTW).
Abstract: In real-world time series recognition applications, it is possible to have data with varying length patterns. However, when using artificial neural networks (ANN), it is standard practice to use fixed-sized mini-batches. To do this, time series data with varying lengths are typically normalized so that all the patterns are the same length. Normally, this is done using zero padding or truncation without much consideration. We propose a novel method of normalizing the lengths of the time series in a dataset by exploiting the dynamic matching ability of Dynamic Time Warping (DTW). In this way, the time series lengths in a dataset can be set to a fixed size while maintaining features typical to the dataset. In the experiments, all 11 datasets with varying length time series from the 2018 UCR Time Series Archive are used. We evaluate the proposed method by com-paring it with 18 other length normalization methods on a Convolutional Neural Network (CNN), a Long-Short Term Memory network (LSTM), and a Bidirectional LSTM (BLSTM). The code is publicly available at https://github.com/uchidalab/vary length time series.
References
More filters
Book
Christopher M. Bishop1
17 Aug 2006
TL;DR: Probability Distributions, linear models for Regression, Linear Models for Classification, Neural Networks, Graphical Models, Mixture Models and EM, Sampling Methods, Continuous Latent Variables, Sequential Data are studied.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

22,840 citations

Journal ArticleDOI
TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.
Abstract: (2007). Pattern Recognition and Machine Learning. Technometrics: Vol. 49, No. 3, pp. 366-366.

18,802 citations


"Classification of varying length ti..." refers methods in this paper

  • ...The issue of overfitting in case of maximum likelihood (ML) method for parameter estimation in GMM [6] can be addressed by adapting the parameters of GMM using the training examples of all classes....

    [...]

Journal ArticleDOI
TL;DR: The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.

4,673 citations


"Classification of varying length ti..." refers methods in this paper

  • ...In the conventional GMM-UBM based classifier [2], a large UBM is built using feature vectors of all classes and then the class models are built by adapting the UBM....

    [...]

  • ...Training a GMM-UBM is much faster than the conventional GMMs and also allows a fast-scoring technique [2] during testing....

    [...]

  • ...The issues of overfitting in case of maximum likelihood (ML) method for parameter estimation and the within-class variability are addressed in GMM-UBM approach [2], where the UBM is built as a large GMM from the data of all classes....

    [...]

Journal ArticleDOI
TL;DR: A framework for maximum a posteriori (MAP) estimation of hidden Markov models (HMM) is presented, and Bayesian learning is shown to serve as a unified approach for a wide range of speech recognition applications.
Abstract: In this paper, a framework for maximum a posteriori (MAP) estimation of hidden Markov models (HMM) is presented. Three key issues of MAP estimation, namely, the choice of prior distribution family, the specification of the parameters of prior densities, and the evaluation of the MAP estimates, are addressed. Using HMM's with Gaussian mixture state observation densities as an example, it is assumed that the prior densities for the HMM parameters can be adequately represented as a product of Dirichlet and normal-Wishart densities. The classical maximum likelihood estimation algorithms, namely, the forward-backward algorithm and the segmental k-means algorithm, are expanded, and MAP estimation formulas are developed. Prior density estimation issues are discussed for two classes of applications/spl minus/parameter smoothing and model adaptation/spl minus/and some experimental results are given illustrating the practical interest of this approach. Because of its adaptive nature, Bayesian learning is shown to serve as a unified approach for a wide range of speech recognition applications. >

2,430 citations


"Classification of varying length ti..." refers methods in this paper

  • ...The model for a class is obtained by updating the parameters of the UBM using the training examples of the respective class for adaptation of UBM [10]....

    [...]

  • ...This provides a tighter coupling between the class model and the UBM, and gives a better performance than the decoupled models like conventional GMMs. Training a GMM-UBM is much faster than the conventional GMMs and also allows a fast-scoring technique [2] during testing....

    [...]

  • ...Building ESAGMM adapted either directly from the UBM or from the adapted class specific model are the extentions of the proposed approach....

    [...]

  • ...The GMM-UBM approach gives a slightly improved performance over GMM based classifiers....

    [...]

  • ...The issues of overfitting in case of maximum likelihood (ML) method for parameter estimation and the within-class variability are addressed in GMM-UBM approach [2], where the UBM is built as a large GMM from the data of all classes....

    [...]

Proceedings ArticleDOI
04 Sep 2005
TL;DR: A database of emotional speech that was evaluated in a perception test regarding the recognisability of emotions and their naturalness and can be accessed by the public via the internet.
Abstract: The article describes a database of emotional speech. Ten actors (5 female and 5 male) simulated the emotions, producing 10 German utterances (5 short and 5 longer sentences) which could be used in everyday communication and are interpretable in all applied emotions. The recordings were taken in an anechoic chamber with high-quality recording equipment. In addition to the sound electro-glottograms were recorded. The speech material comprises about 800 sentences (seven emotions * ten actors * ten sentences + some second versions). The complete database was evaluated in a perception test regarding the recognisability of emotions and their naturalness. Utterances recognised better than 80% and judged as natural by more than 60% of the listeners were phonetically labelled in a narrow transcription with special markers for voice-quality, phonatory and articulatory settings and articulatory features. The database can be accessed by the public via the internet (http://www.expressive-speech.net/emodb/).

1,905 citations