scispace - formally typeset
Search or ask a question

Showing papers presented at "The European Symposium on Artificial Neural Networks in 2015"


Proceedings Article
01 Jan 2015
TL;DR: The efficacy of stacked LSTM networks for anomaly/fault detection in time series on ECG, space shuttle, power demand, and multi-sensor engine dataset is demonstrated.
Abstract: Long Short Term Memory (LSTM) networks have been demonstrated to be particularly useful for learning sequences containing longer term patterns of unknown length, due to their ability to maintain long term memory. Stacking recurrent hidden layers in such networks also enables the learning of higher level temporal features, for faster learning with sparser representations. In this paper, we use stacked LSTM networks for anomaly/fault detection in time series. A network is trained on non-anomalous data and used as a predictor over a number of time steps. The resulting prediction errors are modeled as a multivariate Gaussian distribution, which is used to assess the likelihood of anomalous behavior. The efficacy of this approach is demonstrated on four datasets: ECG, space shuttle, power demand, and multi-sensor engine dataset.

969 citations


Book ChapterDOI
01 Jan 2015
TL;DR: Control-flow discovery algorithms proposed up to now, assume that each activity is considered instantaneous, but usually a single log for each performed activity is recorded, regardless of the duration of the activity.
Abstract: Many control-flow discovery algorithms proposed up to now, assume that each activity is considered instantaneous. This is due to the fact that usually a single log for each performed activity is recorded, regardless of the duration of the activity.

46 citations


Proceedings Article
01 Jan 2015
TL;DR: It is concluded that machine learning can be used to reliably detect high-school dropout given the information already available to many schools.
Abstract: Pupils not finishing their secondary education are a big societal problem. Previous studies indicate that machine learning can be used to predict high-school dropout, which allows early interventions. To the best of our knowledge, this paper presents the first large-scale study of that kind. It considers pupils that were at least six months into their Danish high-school education, with the goal to predict dropout in the subsequent three months. We combined information from the MaCom Lectio study administration system, which is used by most Danish high schools, with data from public online sources (name database, travel planner, governmental statistics). In contrast to existing studies that were based on only a few hundred students, we considered a considerably larger sample of 36299 pupils for training and 36299 for testing. We evaluated different machine learning methods. A random forest classifier achieved an accuracy of 93.47% and an area under the curve of 0.965. Given the large sample, we conclude that machine learning can be used to reliably detect high-school dropout given the information already available to many schools.

42 citations


Proceedings Article
01 Jan 2015
TL;DR: A review of research and practice in LA and EDM is presented accompanied by the most central methods, bene- ts, and challenges of the eld.
Abstract: The growing interest in recent years towards Learning An- alytics (LA) and Educational Data Mining (EDM) has enabled novel ap- proaches and advancements in educational settings. The wide variety of research and practice in this context has enforced important possibilities and applications from adaptation and personalization of Technology En- hanced Learning (TEL) systems to improvement of instructional design and pedagogy choices based on students needs. LA and EDM play an im- portant role in enhancing learning processes by oering innovative methods of development and integration of more personalized, adaptive, and inter- active educational environments. This has motivated the organization of the ESANN 2015 Special Session in Advances in Learning Analytics and Educational Data Mining. Here, a review of research and practice in LA and EDM is presented accompanied by the most central methods, bene- ts, and challenges of the eld. Additionally, this paper covers a review of novel contributions into the Special Session.

41 citations


Proceedings Article
01 Jan 2015
TL;DR: In this article, the authors proposed a novel framework for the classification of handwritten digits that learns sparse representations using probabilistic quadtrees and deep belief nets, which shows promising results and significantly outperforms traditional Deep Belief Networks.
Abstract: Learning sparse feature representations is a useful instrument for solving an unsupervised learning problem. In this paper, we present three labeled handwritten digit datasets, collectively called n-MNIST. Then, we propose a novel framework for the classification of handwritten digits that learns sparse representations using probabilistic quadtrees and Deep Belief Nets. On the MNIST and n-MNIST datasets, our framework shows promising results and significantly outperforms traditional Deep Belief Networks.

41 citations


Proceedings Article
01 Jan 2015
TL;DR: This paper evaluates one-vs-all binarization technique in the context of Random Forest, a state-of-the-art decision forest building algorithm which focuses on generating diverse decision trees as the base classifiers.
Abstract: Binarization techniques are widely used to solve multi-class classification problems. These techniques reduce the classification complexity of multi-class classification problems by dividing the original data set into two-class segments or replicas. Then a set of simpler classifiers are learnt from the two-class segments or replicas. The outputs from these classifiers are combined for final classification. Binarization can improve prediction accuracy when compared to a single classifier. However, to be declared as a superior technique, binarization techniques need to prove themselves in the context of ensemble classifiers such as Random Forest. Random Forest is a state-of-the-art popular decision forest building algorithm which focuses on generating diverse decision trees as the base classifiers. In this paper we evaluate one-vs-all binarization technique in the context of Random Forest. We present an elaborate experimental result involving ten widely used data sets from the UCI Machine Learning Repository. The experimental results exhibit the effectiveness of one-vs-all binarization technique in the context of Random Forest.

26 citations


Proceedings Article
01 Jan 2015
TL;DR: This paper extends the Thompson Sampling approach to the multi-objective multi-armed bandit problem and compares empirically between ParetoThompson Sampling and linear scalarized Thompson Samplings on a test suite of MOMAB problems with Bernoulli distributions.
Abstract: The multi-objective multi-armed bandit (MOMAB) problem is a se- quential decision process with stochastic rewards. Each arm generates a vector of rewards instead of a single scalar reward. Moreover, these multiple rewards might be conflicting. The MOMAB-problem has a set of Pareto optimal arms and an agent's goal is not only to find that set but also to play evenly or fairly the arms in that set. To find the Pareto optimal arms, linear scalarized function or Pareto dominance relations can be used. The linear scalarized function converts the multi- objective optimization problem into a single objective one and is a very popular ap- proach because of its simplicity. The Pareto dominance relations optimizes directly the multi-objective problem. In this paper, we extend the Thompson Sampling pol- icy to be used in theMOMAB problem. We propose Pareto Thompson Sampling and linear scalarized Thompson Sampling approaches. We compare empirically between Pareto Thompson Sampling and linear scalarized Thompson Sampling on a test suite ofMOMAB problems with Bernoulli distributions. Pareto Thompson Sampling is the approach with the best empirical performance.

25 citations


Journal ArticleDOI
02 Dec 2015
TL;DR: It is found that selecting an informative subset of many agents may be more efficient than training single agents or full ensembles, and while the selective ensembls have a small number of agents, they significantly outperform the large ensemble.
Abstract: Ensemble models can achieve more accurate and robust predictions than single learners. A selective ensemble may further improve the predictions by selecting a subset of the models from the entire ensemble, based on a quality criterion. We consider reinforcement learning ensembles, where the members are artificial neural networks. In this context, we extensively evaluate a recently introduced algorithm for ensemble subset selection in reinforcement learning scenarios. The aim of the learning strategy is to select members whose weak decisions are compensated by strong decisions for collected states. The correctness of a decision is determined by the Bellman error. In our empirical evaluations, we compare the benchmark performances of the full ensembles and the selective ensembles in generalized maze and in SZ-Tetris. Both are large state environments. We found that while the selective ensembles have a small number of agents, they significantly outperform the large ensembles. We therefore conclude that selecting an informative subset of many agents may be more efficient than training single agents or full ensembles.

17 citations


Proceedings Article
01 Jan 2015
TL;DR: An efficient version of a robust clustering algorithm for sparse educational data that takes the weights, aligning a sample with the corresponding population, into account that is utilized to divide the Finnish student population of PISA 2012 into groups.
Abstract: Clustering as an unsupervised technique is predominantly used in unweighted settings. In this paper, we present an efficient version of a robust clustering algorithm for sparse educational data that takes the weights, aligning a sample with the corresponding population, into account. The algorithm is utilized to divide the Finnish student population of PISA 2012 (the latest data from the Programme for International Student Assessment) into groups, according to their attitudes and perceptions towards mathematics, for which one third of the data is missing. Furthermore, necessary modifications of three cluster indices to reveal an appropriate number of groups are proposed and demonstrated.

14 citations


Proceedings Article
22 Apr 2015
TL;DR: This article proposes an unified and generic framework that embraces a large spectrum of models from the traditional way to use SOM, with the best matching unit as output, to models related to the radial basis function network paradigm, when using local receptive field as output.
Abstract: Self-organizing map (SOM) is a powerful paradigm that is extensively applied for clustering and visualization purpose. It is also used for regression learning, especially in robotics, thanks to its ability to provide a topological projection of high dimensional non linear data. In this case, data extracted from the SOM are usually restricted to the best matching unit (BMU), which is the usual way to use SOM for classification , where class labels are attached to individual neurons. In this article, we investigate the influence of considering more information from the SOM than just the BMU when performing regression. For this purpose , we quantitatively study several output functions for the SOM, when using these data as input of a linear regression, and find that the use of additional activities to the BMU can strongly improve regression performance. Thus, we propose an unified and generic framework that embraces a large spectrum of models from the traditional way to use SOM, with the best matching unit as output, to models related to the radial basis function network paradigm, when using local receptive field as output.

14 citations


Proceedings Article
01 Jan 2015
TL;DR: An algorithm that solves optimization problems on a matrix manifold M ⊆ R with an additional rank inequality constraint with the help of new geometric objects is presented.
Abstract: This paper presents an algorithm that solves optimization problems on a matrix manifold M ⊆ R with an additional rank inequality constraint. New geometric objects are defined to facilitate efficiently finding a suitable rank. The convergence properties of the algorithm are given and a weighted low-rank approximation problem is used to illustrate the efficiency and effectiveness of the algorithm.

Proceedings Article
01 Jan 2015
TL;DR: The pairwise functional followed from a simple homotopic function can be incorporated within a geometrical framework in order to yield a bi- parametric approach able to combine several kernel matrices.
Abstract: This work presents an approach allowing for an interactive visualization of dimensionality reduction outcomes, which is based on an extended view of conventional homotopy. The pairwise functional followed from a simple homotopic function can be incorporated within a geometrical framework in order to yield a biparametric approach able to combine several kernel matrices. Therefore, the users can establish the mixture of kernels in an intuitive fashion by only varying two parameters. Our approach is tested by using kernel alternatives for conventional methods of spectral dimensionality reduction such as multidimensional scalling, locally linear embedding and laplacian eigenmaps. The proposed mixture represents every single dimensionality reduction approach as well as helps users to find a suitable representation of embedded data.

Proceedings Article
01 Jan 2015
TL;DR: This paper compares COSMODE EPS forecasts from the German Meteorological Service (DWD) postprocessed with non-homogeneous Gaussian regression to a multivariate support vector regression model and introduces a hybrid model that employs a weighted prediction of both approaches.
Abstract: After decades of dominating wind forecasts based on numerical weather predictions, statistical models gained attention for shortestterm forecast horizons in the recent past. A rigorous experimental comparison between both model types is rare. In this paper, we compare COSMODE EPS forecasts from the German Meteorological Service (DWD) postprocessed with non-homogeneous Gaussian regression to a multivariate support vector regression model. Further, a hybrid model is introduced that employs a weighted prediction of both approaches.

Proceedings Article
01 Apr 2015
TL;DR: This paper proposes a new Genetic Algorithm for Multi-Label Correlation-Based Feature Selection (GA-ML-CFS), which performs a global search in the space of candidate feature subsets, in order to select a high-quality feature subset that is used by a multi-label classification algorithm – in this work, the Multi- Label k-NN algorithm.
Abstract: This paper proposes a new Genetic Algorithm for Multi-Label Correlation-Based Feature Selection (GA-ML-CFS). This GA performs a global search in the space of candidate feature subsets, in order to select a high-quality feature subset that is used by a multi-label classification algorithm – in this work, the Multi-Label k-NN algorithm. We compare the results of GA-ML-CFS with the results of the previously proposed Hill-Climbing for Multi-Label CorrelationBased Feature Selection (HC-ML-CFS), across 10 multi-label datasets.

Proceedings Article
01 Jan 2015
TL;DR: The results show that using ESN is a promising approach for dynamic gesture recognition and give indications for future experi- ments.
Abstract: In the last decade, training recurrent neural networks (RNN) using techniques from the area of reservoir computing (RC) became more attractive for learning sequential data due to the ease of network train- ing. Although successfully applied in the language and speech domains, only little is known about using RC techniques for dynamic gesture recog- nition. We therefore conducted experiments on command gestures using Echo State Networks (ESN) to investigate both the eect of dierent ges- ture sequence representations and dierent parameter congurations. For recognition we employed the ensemble technique, i.e. using ESN as weak classiers. Our results show that using ESN is a promising approach for dynamic gesture recognition and we give indications for future experi- ments.

Proceedings ArticleDOI
01 Jan 2015
TL;DR: A novel approach for robust visual terrain classification by generating feature sequences on repeatedly mutated image patches by learning Recurrent Neural Networks (RNNs) outperforms previous methods significantly.
Abstract: A novel approach for robust visual terrain classification by generating feature sequences on repeatedly mutated image patches is pre- sented. These sequences providing the feature vector progress under a cer- tain image operation are learned with Recurrent Neural Networks (RNNs). The approach is studied for image patch based terrain classification for wheeled robots. Thereby, various RNN architectures, namely, standard RNNs, Long Short Term Memory networks (LSTMs), Dynamic Cortex Memory networks (DCMs) as well as bidirectional variants of the men- tioned architecture are investigated and compared to recently used state- of-the-art methods for real-time terrain classification. The results show that the presented approach outperforms previous methods significantly.

Proceedings Article
01 Jan 2015
TL;DR: Three carbonation depth predicting models using decision tree approach are developed and reduced bagged ensemble regression tree showed the highest prediction and generalization capability.
Abstract: In this work, three carbonation depth predicting models using decision tree approach are developed. Carbonation, in urban areas is often a reason for reinforcement steel corrosion that causes premature degradation, loss of serviceability and safety of reinforced concrete structures. The adopted decision trees are regression tree, bagged ensemble and reduced bagged ensemble regression tree. The evaluation of the predictions performance of the developed models reveals that all the three models perform reasonably well. Among them, reduced bagged ensemble regression tree showed the highest prediction and generalization capability.

Proceedings Article
01 Jan 2015
TL;DR: It is proposed to adjust the size of processed windows to the depth and to supply inferred height above ground to the network, which signicantly improves object-class segmentation results on the NYU depth dataset.
Abstract: Convolutional neural networks are popular for image labeling tasks, because of built-in translation invariance. They do not adopt well to scale changes, however, and cannot easily adjust to classes which regularly appear in certain scene regions. This is especially true when the network is applied in a sliding window. When depth data is available, we can address both problems. We propose to adjust the size of processed windows to the depth and to supply inferred height above ground to the network, which signicantly improves object-class segmentation results on the NYU depth dataset.

Proceedings Article
01 Jan 2015
TL;DR: The Ordered Decomposition DAGs kernel framework, a framework that allows the denition of graph kernels from tree kernels, allows to easily dene new state-of-the-art graph kernels, and is improved by increasing its expressivity by adding new features involving partial tree features.
Abstract: In this paper, we show how the Ordered Decomposition DAGs kernel framework, a framework that allows the denition of graph kernels from tree kernels, allows to easily dene new state-of-the-art graph kernels. Here we consider a quite fast graph kernel based on the Subtree kernel (ST), and we improve it by increasing its expressivity by adding new features involving partial tree features. While the worst-case complexity of the new obtained graph kernel does not increase, its eectiveness is improved, as shown on several chemical datasets, reaching state-of-the-art performances.

Proceedings Article
01 Jan 2015
TL;DR: Two model selection approaches for Least Squares Support Vector Machine (LS-SVM) classiers are proposed, based on Fully-empirical Algorithmic Stability (FAS) and Bag of Little Bootstraps (BLB), which scale sub-linearly respect to the size of the learning set and are well suited for big data applica- tions.
Abstract: Model selection is a key step in learning from data, because it allows to select optimal models, by avoiding both under- and over-tting. However, in the Big Data framework, the eectiveness of a model selec- tion approach is assessed not only through the accuracy of the learned model but also through the time and computational resources needed to complete the procedure. In this paper, we propose two model selection ap- proaches for Least Squares Support Vector Machine (LS-SVM) classiers, based on Fully-empirical Algorithmic Stability (FAS) and Bag of Little Bootstraps (BLB). The two methods scale sub-linearly respect to the size of the learning set and, therefore, are well suited for big data applica- tions. Experiments are performed on a Graphical Processing Unit (GPU), showing up to 30x speed-ups with respect to conventional CPU-based im- plementations.

Proceedings Article
01 Jan 2015
TL;DR: It is demonstrated that automatically adapted metrics better identify the underlying programming strategy as compared to their default counterparts in a benchmark example from programming.
Abstract: Today's learning supporting systems for programming mostly rely on pre-coded feedback provision, such that their applicability is restricted to modelled tasks. In this contribution, we investigate the suitability of machine learning techniques to automate this process by means of a presentationm of similar solution strategies from a set of stored examples. To this end we apply structure metric learning methods in local and global alignment which can be used to compare Java programs. We demonstrate that automatically adapted metrics better identify the underlying programming strategy as compared to their default counterparts in a benchmark example from programming.

Proceedings Article
01 Jan 2015
TL;DR: A survey of recent methods developed for feature selection/learning and their application to real world problems is provided, together with a review of the contributions to the ESANN 2015 special session on Feature and Kernel Learning.
Abstract: Feature selection and weighting has been an active research area in the last few decades nding success in many dierent applications. With the advent of Big Data, the adequate identication of the relevant features has converted feature selection in an even more indispensable step. On the other side, in kernel methods features are implicitly represented by means of feature mappings and kernels. It has been shown that the correct selection of the kernel is a crucial task, as long as an erroneous se- lection can lead to poor performance. Unfortunately, manually searching for an optimal kernel is a time-consuming and a sub-optimal choice. This tutorial is concerned with the use of data to learn features and kernels au- tomatically. We provide a survey of recent methods developed for feature selection/learning and their application to real world problems, together with a review of the contributions to the ESANN 2015 special session on Feature and Kernel Learning. 1 Feature learning In the last few years, several datasets with high dimensionality have become publicly available on the Internet. This fact has brought an interesting challenge to the research community, since for the machine learning methods it is dicult to deal with a high number of input features. To cope with the problem of the high number of input features, dimensionality reduction techniques can be applied to reduce the dimensionality of the original data and improve learning performance. These dimensionality reduction techniques usually come in two

Proceedings Article
01 Jan 2015
TL;DR: A novel technique based on a robust clustering algorithm and multiple internal cluster indices is proposed that allows one to generate a dynamic decision tree like structure to represent the original data in the leaf nodes to divide a given set of multiple time series containing missing values into disjoint subsets.
Abstract: A novel technique based on a robust clustering algorithm and multiple internal cluster indices is proposed. The suggested, hierarchical approach allows one to generate a dynamic decision tree like structure to represent the original data in the leaf nodes. It is applied here to divide a given set of multiple time series containing missing values into disjoint subsets. The whole algorithm is first described and then experimented with one particular data set from the UCI repository, already used in (1) for a similar exploration. The obtained results are very promising.

Proceedings Article
01 Jan 2015
TL;DR: This work proposes to vary the size of the bootstrap samples randomly within a predefined range in order to increase the forest accuracy and conducts an elaborate experimentation on several different datasets from UCI Machine Learning Repository to show the worthiness of the technique.
Abstract: The Random Forest algorithm generates quite diverse decision trees as the base classiers for high dimensional data sets. However, for low dimensional data sets the diversity among the trees falls sharply. In Random Forest, the size of the bootstrap samples generally remains the same every time to generate a decision tree as the base classier. In this paper we propose to vary the size of the bootstrap samples randomly within a predened range in order to increase diversity among the trees. We conduct an elaborate experimentation on several low dimensional data sets from UCI Machine Learning Repository. The experimental results show the eectiveness of our proposed technique.

Proceedings Article
01 Jan 2015
TL;DR: A novel technique to determine the saliency of features for the multilayer perceptron (MLP) neural network is presented, based on the analytic derivative of the feedforward mapping with respect to inputs, which is then integrated over the training data using the mean of the absolute values.
Abstract: A novel technique to determine the saliency of features for the multilayer perceptron (MLP) neural network is presented. It is based on the analytic derivative of the feedforward mapping with respect to inputs, which is then integrated over the training data using the mean of the absolute values. Experiments demonstrating the viability of the approach are given with small benchmark data sets. The cross-validation based framework for reliable determination of MLP that has been used in the experiments was introduced in [1, 2].

Proceedings Article
01 Jan 2015
TL;DR: This paper uses a multilayer perceptron to build a goal expectancy model that estimates the conversion probability of shots, and uses it to evaluate the scoring performance of Premier League footballers.
Abstract: Association football is characterized by the lowest scoring rate of all major sports. A typical value of less than 3 goals per game makes it difficult to find strong effects on goal scoring. Instead of goals, one can focus on the production of shots, increasing the available sample size. However, the value of shots depends heavily on different factors, and it is important to take this variability into account. In this paper, we use a multilayer perceptron to build a goal expectancy model that estimates the conversion probability of shots, and use it to evaluate the scoring performance of Premier League footballers.

Proceedings Article
01 Apr 2015
TL;DR: A technique for combining different dissimilarity measures into a Learning Vector Quantization classification scheme for heterogeneous, mixed data is presented and applied to diagnosing viral crop disease in cassava plants from histograms and shape features extracted from cassava leaf images.
Abstract: Prototype-based classification, identifying representatives of the data and suitable measures of dissimilarity, has been used successfully for tasks where interpretability of the classification is key. In many practical problems, one object is represented by a collection of different subsets of features, that might require different dissimilarity measures. In this paper we present a technique for combining different dissimilarity measures into a Learning Vector Quantization classification scheme for heterogeneous, mixed data. To illustrate the method we apply it to diagnosing viral crop disease in cassava plants from histograms (HSV) and shape features (SIFT) extracted from cassava leaf images. Our results demonstrate the feasibility of the method and increased performance compared to previous approaches.

Proceedings Article
22 Apr 2015
TL;DR: This work provides a learning algorithm combining distributed Ex- treme Learning Machine and an information fusion rule based on the ag- gregation of experts advice, to build day ahead probabilistic solar PV power production forecasts.
Abstract: We provide a learning algorithm combining distributed Ex- treme Learning Machine and an information fusion rule based on the ag- gregation of experts advice, to build day ahead probabilistic solar PV power production forecasts. These forecasts use, apart from the current day solar PV power production, local meteorological inputs, the most valu- able of which is shown to be precipitation. Experiments are then run in one French region, Provence-Alpes-C^ d'Azur, to evaluate the algorithm performance.

Proceedings ArticleDOI
22 Apr 2015
TL;DR: The proposed tracker is inspired on the hierarchical short-term and medium-term memories for which patterns are stored as discriminators of a WiSARD weightless neural network.
Abstract: In this paper it is proposed a generic object tracker with real- time performance. The proposed tracker is inspired on the hierarchical short-term and medium-term memories for which patterns are stored as discriminators of a WiSARD weightless neural network. This approach is evaluated through benchmark video sequences published by Babenko et al. Experiments show that the WiSARD-based approach outperforms most of the previous results in the literature, with respect to the same dataset.

Proceedings Article
01 Jan 2015
TL;DR: A new and straightforward criterion for successive insertion and deletion of training points in sparse Gaussian process regression is introduced based on an approximation of the selection technique proposed by Smola and Bartlett.
Abstract: In this paper, we introduce a new and straightforward criterion for successive insertion and deletion of training points in sparse Gaussian process regression. Our novel approach is based on an approximation of the selection technique proposed by Smola and Bartlett (1). It is shown that the resulting selection strategies are as fast as the purely randomized schemes for insertion and deletion of training points. Experiments on real-world robot data demonstrate that our obtained regression models are competitive with the computationally intensive state-of-the-art methods in terms of generalization accuracy.