scispace - formally typeset
Search or ask a question
Posted Content

A One-Class Support Vector Machine Calibration Method for Time Series Change Point Detection

TL;DR: In this paper, a heuristic search method was proposed to find a good set of input data and hyperparameters that yield a well-performing model for detecting change points in time series with fewer training data.
Abstract: It is important to identify the change point of a system's health status, which usually signifies an incipient fault under development. The One-Class Support Vector Machine (OC-SVM) is a popular machine learning model for anomaly detection and hence could be used for identifying change points; however, it is sometimes difficult to obtain a good OC-SVM model that can be used on sensor measurement time series to identify the change points in system health status. In this paper, we propose a novel approach for calibrating OC-SVM models. The approach uses a heuristic search method to find a good set of input data and hyperparameters that yield a well-performing model. Our results on the C-MAPSS dataset demonstrate that OC-SVM can also achieve satisfactory accuracy in detecting change point in time series with fewer training data, compared to state-of-the-art deep learning approaches. In our case study, the OC-SVM calibrated by the proposed model is shown to be useful especially in scenarios with limited amount of training data.
Citations
More filters
Proceedings ArticleDOI
26 Jul 2019
TL;DR: It is shown that the encoder-decoder model is able to identify the injected anomalies in a modern AM manufacturing process in an unsupervised fashion and gives hints about the temperature non-uniformity of the testbed during manufacturing, which was not previously known prior to the experiment.
Abstract: We present a novel unsupervised deep learning approach that utilizes an encoder-decoder architecture for detecting anomalies in sequential sensor data collected during industrial manufacturing. Our approach is designed to not only detect whether there exists an anomaly at a given time step, but also to predict what will happen next in the (sequential) process. We demonstrate our approach on a dataset collected from a real-world Additive Manufacturing (AM) testbed. The dataset contains infrared (IR) images collected under both normal conditions and synthetic anomalies. We show that our encoder-decoder model is able to identify the injected anomalies in a modern AM manufacturing process in an unsupervised fashion. In addition, our approach also gives hints about the temperature non-uniformity of the testbed during manufacturing, which was not previously known prior to the experiment.

30 citations

DOI
15 Oct 2021
TL;DR: Wang et al. as discussed by the authors proposed a data anomaly detection algorithm based on convolutional neural network and encoder-decoder architecture CNN-LSTMED (Convolutional Neural Networks Long Short-Term Encoder-Decoder).
Abstract: The purpose of anomaly detection is to detect data that deviates from the expected, and is widely used in intrusion detection, data preprocessing and so on.For data anomaly detection, we propose a data anomaly detection algorithm based on convolutional neural network and Encoder-Decoder architecture CNN-LSTMED (Convolutional Neural Networks Long Short-Term Encoder-Decoder).First,we use the convolutional neural network to encode the time series data to obtain the encoded sequence,and use the features extracted from the sequence as the input of the nonlinear model long short-term memory network LSTM (Long Short-Term Memory) to decode and output the decoded sequence. Finally, the reconstruction error is calculated and the threshold is set to determine the abnormal point. Through experimental comparison with GRUED (Gated Recurrent Neural Encoder-Decoder), LSTMED (Long Short-Term Memory Encoder-Decoder) ,and other algorithms on the KDD99 data set and credit card fraud data set,it turns out that our algorithm has strong robustness and accuracy .

7 citations

Posted Content
TL;DR: This question is answered via a rigorous analysis of two commonly used uncertainty metrics in ensemble learning, namely ensemble mean and ensemble variance: ensemble mean is preferable with respect to ensemble variance as an uncertainty metric for decision making.
Abstract: Ensemble learning is widely applied in Machine Learning (ML) to improve model performance and to mitigate decision risks. In this approach, predictions from a diverse set of learners are combined to obtain a joint decision. Recently, various methods have been explored in literature for estimating decision uncertainties using ensemble learning; however, determining which metrics are a better fit for certain decision-making applications remains a challenging task. In this paper, we study the following key research question in the selection of uncertainty metrics: when does an uncertainty metric outperforms another? We answer this question via a rigorous analysis of two commonly used uncertainty metrics in ensemble learning, namely ensemble mean and ensemble variance. We show that, under mild assumptions on the ensemble learners, ensemble mean is preferable with respect to ensemble variance as an uncertainty metric for decision making. We empirically validate our assumptions and theoretical results via an extensive case study: the diagnosis of referable diabetic retinopathy.

5 citations

Posted Content
TL;DR: The proposed novel approach of augmenting the classification model with an additional unsupervised learning task leads to improved fault detection and diagnosis performance, especially on out-of-distribution examples including both incipient and unknown faults.
Abstract: The Monte Carlo dropout method has proved to be a scalable and easy-to-use approach for estimating the uncertainty of deep neural network predictions. This approach was recently applied to Fault Detection and Di-agnosis (FDD) applications to improve the classification performance on incipient faults. In this paper, we propose a novel approach of augmenting the classification model with an additional unsupervised learning task. We justify our choice of algorithm design via an information-theoretical analysis. Our experimental results on three datasets from diverse application domains show that the proposed method leads to improved fault detection and diagnosis performance, especially on out-of-distribution examples including both incipient and unknown faults.

5 citations

Posted Content
TL;DR: This work identifies common pitfalls in ensemble models through extensive experiments with several popular ensemble models on two real-world datasets, and discusses how to design more effective ensemble models for detecting and diagnosing Intermediate-Severity faults.
Abstract: Intermediate-Severity (IS) faults present milder symptoms compared to severe faults, and are more difficult to detect and diagnose due to their close resemblance to normal operating conditions. The lack of IS fault examples in the training data can pose severe risks to Fault Detection and Diagnosis (FDD) methods that are built upon Machine Learning (ML) techniques, because these faults can be easily mistaken as normal operating conditions. Ensemble models are widely applied in ML and are considered promising methods for detecting out-of-distribution (OOD) data. We identify common pitfalls in these models through extensive experiments with several popular ensemble models on two real-world datasets. Then, we discuss how to design more effective ensemble models for detecting and diagnosing IS faults.

3 citations

References
More filters
Journal ArticleDOI
13 May 1983-Science
TL;DR: There is a deep and useful connection between statistical mechanics and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters), and a detailed analogy with annealing in solids provides a framework for optimization of very large and complex systems.
Abstract: There is a deep and useful connection between statistical mechanics (the behavior of systems with many degrees of freedom in thermal equilibrium at a finite temperature) and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters). A detailed analogy with annealing in solids provides a framework for optimization of the properties of very large and complex systems. This connection to statistical mechanics exposes new information and provides an unfamiliar perspective on traditional optimization problems and methods.

41,772 citations

Journal ArticleDOI
Rainer Storn1, Kenneth Price
TL;DR: In this article, a new heuristic approach for minimizing possibly nonlinear and non-differentiable continuous space functions is presented, which requires few control variables, is robust, easy to use, and lends itself very well to parallel computation.
Abstract: A new heuristic approach for minimizing possibly nonlinear and non-differentiable continuous space functions is presented. By means of an extensive testbed it is demonstrated that the new method converges faster and with more certainty than many other acclaimed global optimization methods. The new method requires few control variables, is robust, easy to use, and lends itself very well to parallel computation.

24,053 citations

Proceedings Article
03 Dec 2012
TL;DR: This work describes new algorithms that take into account the variable cost of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation and shows that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms.
Abstract: The use of machine learning algorithms frequently involves careful tuning of learning parameters and model hyperparameters. Unfortunately, this tuning is often a "black art" requiring expert experience, rules of thumb, or sometimes brute-force search. There is therefore great appeal for automatic approaches that can optimize the performance of any given learning algorithm to the problem at hand. In this work, we consider this problem through the framework of Bayesian optimization, in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). We show that certain choices for the nature of the GP, such as the type of kernel and the treatment of its hyperparameters, can play a crucial role in obtaining a good optimizer that can achieve expertlevel performance. We describe new algorithms that take into account the variable cost (duration) of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks.

5,654 citations

Journal ArticleDOI
TL;DR: In this paper, the authors propose a method to estimate a function f that is positive on S and negative on the complement of S. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space.
Abstract: Suppose you are given some data set drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified value between 0 and 1. We propose a method to approach this problem by trying to estimate a function f that is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabeled data.

4,397 citations

Book
25 Nov 2014
TL;DR: The differential evolution (DE) algorithm is a practical approach to global numerical optimization which is easy to understand, simple to implement, reliable, and fast as discussed by the authors, which is a valuable resource for professionals needing a proven optimizer and for students wanting an evolutionary perspective on global numerical optimisation.
Abstract: Problems demanding globally optimal solutions are ubiquitous, yet many are intractable when they involve constrained functions having many local optima and interacting, mixed-type variables.The differential evolution (DE) algorithm is a practical approach to global numerical optimization which is easy to understand, simple to implement, reliable, and fast. Packed with illustrations, computer code, new insights, and practical advice, this volume explores DE in both principle and practice. It is a valuable resource for professionals needing a proven optimizer and for students wanting an evolutionary perspective on global numerical optimization.

4,273 citations