scispace - formally typeset
Search or ask a question

Showing papers presented at "The European Symposium on Artificial Neural Networks in 2011"


Proceedings Article
01 Jan 2011
TL;DR: This work shows how to learn many layers of features on color images and how these features are used to initialize deep autoencoders, which are then used to map images to short binary codes.
Abstract: We show how to learn many layers of features on color images and we use these features to initialize deep autoencoders. We then use the autoencoders to map images to short binary codes. Using semantic hashing [6], 28-bit codes can be used to retrieve images that are similar to a query image in a time that is independent of the size of the database. This extremely fast retrieval makes it possible to search using multiple di erent transformations of the query image. 256-bit binary codes allow much more accurate matching and can be used to prune the set of images found using the 28-bit codes.

406 citations


Proceedings Article
27 Apr 2011
TL;DR: The present tutorial introducing the ESANN deep learning special session details the state-of-the-art models and summarizes the current understanding of this learning approach which is a reference for many difficult classification tasks.
Abstract: The deep learning paradigm tackles problems on which shallow architectures (e.g. SVM) are affected by the curse of dimensionality. As part of a two-stage learning scheme involving multiple layers of non-linear processing a set of statistically robust features is automatically extracted from the data. The present tutorial introducing the ESANN deep learning special session details the state-of-the-art models and summarizes the current understanding of this learning approach which is a reference for many difficult classification tasks.

67 citations


Proceedings Article
01 Jan 2011
TL;DR: The outcome of the current research is the combined global-local algorithm for training the complex valued feed forward neural network which is appropriate for the considered chaotic problem.
Abstract: Complex Valued Neural Network is one of the open topics in the machine learning society. In this paper we will try to go through the problems of the complex valued neural networks gradients computations by combining the global and local optimization algorithms. The outcome of the current research is the combined global-local algorithm for training the complex valued feed forward neural network which is appropriate for the considered chaotic problem.

50 citations


Proceedings Article
01 Apr 2011
TL;DR: This tutorial prefaces the special session \Seeing is believing: The importance of visualization in real-world machine learning applications, reects some of the main emerging topics in the eld and provides some clues to the current state and the near future of visualization methods within the framework of Machine Learning.
Abstract: The increasing availability of data sets with a huge amount of information, coded in many dierent features, justies the research on new methods of knowledge extraction: the great challenge is the translation of the raw data into useful information that can be used to improve decision- making processes, detect relevant proles, nd out relationships among features, etc. It is undoubtedly true that a picture is worth a thousand words, what makes visualization methods be likely the most appealing and one of the most relevant kinds of knowledge extration methods. At ESANN 2011, the special session \Seeing is believing: The importance of visualization in real-world machine learning applications" reects some of the main emerging topics in the eld. This tutorial prefaces the session, summarizing some of its contributions, while also providing some clues to the current state and the near future of visualization methods within the framework of Machine Learning.

41 citations


Proceedings Article
01 Jan 2011
TL;DR: In this article, a universal bound for the overfitting of a multilayer perceptrons model under weak assumptions is presented, this bound is valid without Gaussian or identifiability assumptions and it is used to give a hint about determining the true architecture of the MLP model when the number of data goes to infinite.
Abstract: Multilayer perceptrons (MLP) with one hidden layer have been used for a long time to deal with non-linear regression However, in some task, MLP's are too powerful models and a small mean square error (MSE) may be more due to overfitting than to actual modeling If the noise of the regression model is Gaussian, the overfitting of the model is totally determined by the behavior of the likelihood ratio test statistic (LRTS), however in numerous cases the assumption of normality of the noise is arbitrary if not false In this paper, we present an universal bound for the overfitting of such model under weak assumptions, this bound is valid without Gaussian or identifiability assumptions The main application of this bound is to give a hint about determining the true architecture of the MLP model when the number of data goes to infinite As an illustration, we use this theoretical result to propose and compare effective criteria to find the true architecture of an MLP

31 citations


Proceedings Article
01 Jan 2011
TL;DR: A model-based approach is presented that extends multi-class quadratic normal discrim- inant analysis with a model of the mislabelling process, demonstrating the benefits in terms of parameter recovery as well as classification performance, on both synthetic and real-world multi- class problems.
Abstract: Learning a classifier from a training set that contains labelling errors is a difficult, yet not very well studied problem. Here we present a model-based approach that extends multi-class quadratic normal discrim- inant analysis with a model of the mislabelling process. We demonstrate the benefits of this approach in terms of parameter recovery as well as im- proved classification performance, on both synthetic and real-world multi- class problems. We also obtain enhanced accuracy in comparison with a previous model-free approach.

24 citations


Proceedings Article
01 Apr 2011
TL;DR: It is demonstrated that RRL can reliably outperform buy-and-hold for the higher frequency data, in contrast to GP which performed best for monthly data.
Abstract: A novel stochastic adaptation of the recurrent reinforcement learning (RRL) methodology is applied to daily, weekly, and monthly stock index data, and compared to results obtained elsewhere using genetic programming (GP). The data sets used have been a considered a challenging test for algorithmic trading. It is demonstrated that RRL can reliably outperform buy-and-hold for the higher frequency data, in contrast to GP which performed best for monthly data.

23 citations


Proceedings Article
01 Jan 2011
TL;DR: This paper adapts to the LS-SVM case a recent work for sparsifying classical SVM classifiers, which is based on an iterative approximation to the L0-norm, and achieves very sparse models, without significant loss of accuracy compared to standard LS- SVMs or SVMs.
Abstract: This is an electronic version of the paper presented at the 19th European Symposium on Artificial Neural Networks, held in Bruges on 2011

20 citations


Proceedings Article
01 Jan 2011
TL;DR: A normalization of the binary classiers outputs that allows fair comparisons in such cases of biased comparison may be biased towards one specic class when thebinary classiers are built on distinct feature subsets is proposed.
Abstract: This paper proposes a method to perform class-specic fea- ture selection in multiclass support vector machines addressed with the one-against-all strategy. The main issue arises at the nal step of the classication process, where binary classier outputs must be compared one against another to elect the winning class. This comparison may be biased towards one specic class when the binary classiers are built on distinct feature subsets. This paper proposes a normalization of the binary classiers outputs that allows fair comparisons in such cases.

20 citations


Proceedings Article
27 Apr 2011
TL;DR: In this paper, a graph visualization methodology based on hierarchical maximal modularity clustering is described, with interactive and signicant coarsening and rening possibilities, for HIV epidemic analysis in Cuba.
Abstract: This paper describes a graph visualization methodology based on hierarchical maximal modularity clustering, with interactive and signicant coarsening and rening possibilities. An application of this method to HIV epidemic analysis in Cuba is outlined.

18 citations


Proceedings Article
01 Jan 2011
TL;DR: The different behavior of the Maximal Discrepancy and the Rademacher Complexity when applied to linear classifiers is studied and a practical procedure to tighten the bounds is suggested.
Abstract: The Maximal Discrepancy and the Rademacher Complexity are powerful statistical tools that can be exploited to obtain reliable, albeit not tight, upper bounds of the generalization error of a classifier. We study the different behavior of the two methods when applied to linear classifiers and suggest a practical procedure to tighten the bounds. The resulting generalization estimation can be succesfully used for classifier model selection.

Proceedings Article
01 Jan 2011
TL;DR: It is shown that regularization of the inner reservoir network mitigates pa- rameter dependencies and boosts the task-specific performance.
Abstract: Output feedback is crucial for autonomous and parameter- ized pattern generation with reservoir networks. Read-out learning can lead to error amplification in these settings and therefore regularization is important for both generalization and reduction of error amplification. We show that regularization of the inner reservoir network mitigates pa- rameter dependencies and boosts the task-specific performance.

Proceedings Article
27 Apr 2011
TL;DR: A novel approach to P300 detection using Kalman ltering and SVMs is proposed and experiments show that this method is a promising step toward single-trial detection of P300.
Abstract: Brain Computer Interfaces (BCI) are systems enabling hu- mans to communicate with machines through signals generated by the brain. Several kinds of signals can be envisioned as well as means to mea- sure them. In this paper we are particularly interested in even-related brain potentials (ERP) and especially visually-evoked potential signals (P300) measured with surface electroencephalograms (EEG). When the human is stimulated with visual inputs, the P300 signals arise about 300 ms after the visual stimulus has been received. Yet, the EEG signal is often very noisy which makes the P300 detection hard. It is customary to use an average of several trials to enhance the P300 signal and reduce the random noise but this results in a lower bit rate of the interface. In this contribution, we propose a novel approach to P300 detection using Kalman ltering and SVMs. Experiments show that this method is a promising step toward single-trial detection of P300.

Proceedings Article
01 Jan 2011
TL;DR: The formalism of Maximal Ancestral Graphs (MAGs) is used and cSAT+ is adapted to solve the problem of incorporating prior knowledge, in the form of causal relations, in causal models.
Abstract: In this paper we address the problem of incorporating prior knowledge, in the form of causal relations, in causal models. Prior ap- proaches mostly consider knowledge about the presence or absence of edges in the model. We use the formalism of Maximal Ancestral Graphs (MAGs) and adapt cSAT+ to solve this problem, an algorithm for reasoning with datasets defined over different variable sets.

Proceedings Article
01 Jan 2011
TL;DR: A statistical dependence measure is presented for variable selection in the context of classification and its performance is tested over DNA microarray data, a challenging dataset for machine learning researchers due to the high number of genes and relatively small number of measurements.
Abstract: Feature selection is the domain of machine learning which studies data-driven methods to select, among a set of input variables, the ones that will lead to the most accurate predictive model. In this paper, a statistical dependence measure is presented for variable selection in the context of classification. Its performance is tested over DNA microarray data, a challenging dataset for machine learning researchers due to the high number of genes and relatively small number of measurements. This measure is compared against the so called mRMR approach, and is shown to obtain better or equal performance over the binary datasets.

Proceedings Article
01 Jan 2011
TL;DR: This work introduces a simple generalization of RAM-based neurons in order to explore both weightless neural models in the data stream clustering problem.
Abstract: Producing good quality clustering of data streams in real time is a difficult problem, since it is necessary to perform the analysis of data points arriving in a continuous style, with the support of quite limited computational resources. The incremental and evolving nature of the resulting clustering structures must reflect the dynamics of the target data stream. The WiSARD weightless perceptron, and its associated DRASiW extension, are intrinsically capable of, respectively, performing one-shot learning and producing prototypes of the learnt categories. This work introduces a simple generalization of RAM-based neurons in order to explore both weightless neural models in the data stream clustering problem.

Proceedings Article
01 Jan 2011
TL;DR: A combination of Self-Organizing Map approach and navigation functions in the Traveling Salesman Prob- lem with segment goals where paths between goals have to respect obsta- cles demonstrates applicability of SOM principles in such problems in which SOM has not yet been applied.
Abstract: This paper presents a combination of Self-Organizing Map (SOM) approach and navigation functions in the Traveling Salesman Prob- lem with segment goals where paths between goals have to respect obsta- cles. Hence, the problem is called multi-goal path planning. The problem arises from the inspection planning, where a path from which all points of the given polygonal environment have to be "seen". The proposed ap- proach demonstrates applicability of SOM principles in such problems in which SOM has not yet been applied.

Proceedings Article
01 Jan 2011
TL;DR: This paper reviews the application of Fisher information to derive a metric in primary data space to provide a natural coordinate space to represent pairwise distances with respect to a probability distribution p(c|x), defined by an external label c, and use it to compute more informative distances.
Abstract: Clustering methods and nearest neighbour classifiers typically compute distances between data points as a measure of similarity, with nearby pairs of points considered more like each other than remote pairs. The distance measure of choice is often Euclidean, implicitly treating all directions in space as equally relevant. This paper reviews the application of Fisher information to derive a metric in primary data space. The aim is to provide a natural coordinate space to represent pairwise distances with respect to a probability distribution p(c|x), defined by an external label c, and use it to compute more informative distances.

Proceedings Article
01 Jan 2011
TL;DR: The authors connected a computational approach to topology learning, the Explo- ration Observation Machine (XOM) as introduced in (12), with the divergence optimization of SNE, resulting in a new dimension reduction algorithm called Neighbor Embedding XOM (NE- XOM).
Abstract: In this paper we propose the generalization of the recently introduced Neighbor Embedding Exploratory Observation Machine (NE- XOM) for dimension reduction and visualization. We provide a gen- eral mathematical framework called Self Organized Neighbor Embedding (SONE). It treats the components, like data similarity measures and neigh- borhood functions, independently and easily changeable. And it enables the utilization of different divergences, based on the theory of Frechet derivatives. In this way we propose a new dimension reduction and visual- ization algorithm, which can be easily adapted to the user specific request and the actual problem. Various dimension reduction techniques have been introduced based on different properties of the original data to be preserved. The spectrum ranges from linear projections of original data, such as in Principal Component Analysis (PCA) or classical Multidimensional Scaling (MDS) to a wide range of locally linear and non-linear approaches, such as Isomap, Locally Linear Embedding (LLE), Local Linear Coordination (LLC), or charting. Stochastic Neighbor Embed- ding (SNE)approximates the probability distribution in the high-dimensional space, defined by neighboring points, with their probability distribution in a lower-dimensional space. A technique called t-SNE was proposed in (10). It is a variation of SNE considering another statistical model assumption for data distributions. Other methods aim at the preservation of the classification ac- curacy in lower dimensions and incorporate the available label information for the embedding, e. g. Linear Discriminant Analysis (LDA) (5) and generaliza- tions thereof and extensions of the Self Organizing Map (SOM) incorporating class labels. For a comprehensive review on nonlinear dimensionality reduction methods, we refer to (7). Recently, the idea of fast and efficient online learning was combined with the high-quality of divergence based optimization, resulting in a new dimension reduction algorithm called Neighbor Embedding XOM (NE- XOM). Its usefulness and comparison with other methods is shown in (3). The authors connected a computational approach to topology learning, the Explo- ration Observation Machine (XOM) as introduced in (12), with the divergence optimization of SNE. In this contribution, we extend the approach proposed in (3), with a mathematical foundation for the generalization of the principle to

Proceedings Article
01 Jan 2011
TL;DR: Empirical results from experiments for a learned SVM model on twelve UCI datasets indicates that the proposed solution improves the original SVM, and they also improve those reported when using a z-SVM, in terms of g-mean and sensitivity.
Abstract: Standard learning algorithms may perform poorly when learning from unbalanced datasets. Based on the Fisher’s discriminant analysis, a post-processing strategy is introduced to deal datasets with significant imbalance in the data distribution. A new bias is defined, which reduces skew towards the minority class. Empirical results from experiments for a learned SVM model on twelve UCI datasets indicates that the proposed solution improves the original SVM, and they also improve those reported when using a z-SVM, in terms of g-mean and sensitivity.

Proceedings Article
01 Jan 2011
TL;DR: This paper demonstrates the ability of the new algorithms for standard functional data sets using different basis functions, namely Gaussians and Lorentzians to be adapted for relevance learning.
Abstract: Generalized learning vector quantization (GRLVQ) is a prototype based classification algorithm with metric adaptation weighting each data dimensions according to their relevance for the classification task. We present in this paper an extension for functional data, which are usually very high dimensional. This approach supposes the data vectors have to be functional representations. Taking into account, these information the so-called relevance profile are modeled by superposition of simple basis functions depending on only a few parameters. As a consequence, the resulting functional GRLVQ has drastically reduced number of parameters to be adapted for relevance learning. We demonstrate the ability of the new algorithms for standard functional data sets using different basis functions, namely Gaussians and Lorentzians.

Proceedings Article
01 Apr 2011
TL;DR: A hybrid model combining a generative model and a discriminative model for signal labelling and classification tasks, aiming at taking the best from each world is proposed, taking advantage of the usual increased accuracy of generative models on small training datasets and of discriminatives on large training datasets.
Abstract: We propose a hybrid model combining a generative model and a discriminative model for signal labelling and classification tasks, aiming at taking the best from each world. The idea is to focus the learning of the discriminative model on most likely state sequences as output by the generative model. This allows taking advantage of the usual increased accuracy of generative models on small training datasets and of discriminative models on large training datasets. We instantiate this framework with Hidden Markov Models and Hidden Conditional Random Fields. We validate our model on financial time series and on handwriting data.

Proceedings Article
01 Jan 2011
TL;DR: Probabilistic LVQ (PLVQ) allows to re- alize multivariate class memberships for prototypes and training samples, and the prototype labels can be learned from the data during training.
Abstract: We introduce a generalization of Robust Soft Learning Vector Quantization (RSLVQ). This algorithm for nearest prototype classification is derived from an explicit cost function and follows the dynamics of a stochastic gradient ascent. We generalize the RSLVQ cost function with respect to vectorial class labels: Probabilistic LVQ (PLVQ) allows to re- alize multivariate class memberships for prototypes and training samples, and the prototype labels can be learned from the data during training. We present experiments to demonstrate the new algorithm in practice.

Proceedings Article
01 Jan 2011
TL;DR: An evolutionary process is employed to design a neural network controller capable of accomplishing both versions of a robotic rule switching task and revealed a self-organized time perceiving capacity in the agent's cognitive system that signicantly facilitates the accomplishment of tasks, through modulation of the supplementary behavioural and cognitive processes.
Abstract: Biological organisms perceive and act in the world based on spatiotemporal experiences and interpretations. However, articial agents consider mainly the spatial relationships that exist in the world, typically ignoring its temporal aspects. In an attempt to direct research interest towards the fundamental issue of time experiencing, the current work ex- plores two temporally dierent versions of a robotic rule switching task. An evolutionary process is employed to design a neural network controller capable of accomplishing both versions of the task. The systematic explo- ration of neural network dynamics revealed a self-organized time percep- tion capacity in the agent's cognitive system that signicantly facilitates the accomplishment of tasks, through modulation of the supplementary behavioural and cognitive processes.

Proceedings Article
01 Jan 2011
TL;DR: An echo state network is implemented within the concept of actor-critic design to obtain optimal control policy for a mobile robot to anticipate future rewards/punishments and react accordingly.
Abstract: In this paper we implement an echo state network within the concept of actor-critic design to obtain optimal control policy for a mobile robot. The robot is asked to anticipate future rewards/punishments and react accordingly. Experimen- tal results show that the proposed approach is simple and effective.

Proceedings Article
01 Jan 2011
TL;DR: The results indicate that continuous EEG-data combined with BCI methodology is a promising approach to measuring learners' mental states online.
Abstract: This publication aims at developing computer based learning environments adapting to learners' individual cognitive condition. The adaptive mechanism, based on Brain-Computer-Interface (BCI) methodology, relays on electroencephalogram (EEG)-data to diagnose learners' mental states. A first within-subjects study (10 students) was accomplished aiming at differentiating between states of learning and non-learning by means of EEG-data. Support- Vector-Machines classified characteristics in the EEG-signals for these two different stimuli on average as 74.55% correct. For individual students the percentage of correct classification reached 92.22%. The results indicate that continuous EEG-data combined with BCI methodology is a promising approach to measuring learners' mental states online.

Proceedings Article
01 Jan 2011
TL;DR: This paper proposes an approach based on mutual information and the maximal Relevance minimal Redundancy principle to handle the case of mixed data and combines aspects of both wrapper and filter methods and is well suited for regression problems.
Abstract: The problem of feature selection is crucial for many applica- tions and has thus been studied extensively. However, most of the existing methods are designed to handle data consisting only in categorical or in real-valued features while a mix of both kinds of features is often en- countered in practice. This paper proposes an approach based on mutual information and the maximal Relevance minimal Redundancy principle to handle the case of mixed data. It combines aspects of both wrapper and filter methods and is well suited for regression problems. Experiments on artificial and real-world datasets show the interest of the methodology.

Proceedings Article
01 Jan 2011
TL;DR: A practical description of how a learning task can be undertaken based on dierent possible assumptions is given, and two categories of assumptions lead to dierent methods, constraint-based and Bayesian learning.
Abstract: To learn about causal relations between variables just by observing samples from them, particular assumptions must be made about those variables' distributions. This article gives a practical description of how such a learning task can be undertaken based on dierent possible assumptions. Two categories of assumptions lead to dierent methods, constraint-based and Bayesian learning, and in each case we review both the basic ideas and some recent extensions and alternatives to them.

Proceedings Article
27 Apr 2011
TL;DR: In this paper, the hidden layers are learnt in a supervised fash- ion based on kernel partial least squares regression as the obtained hidden features are automatically ranked according to their correlation with the target out- puts.
Abstract: This paper presents an alternative to the supervised KPCA based approach for learning a Multilayer Kernel Machine (MKM) (1). In our proposed procedure, the hidden layers are learnt in a supervised fash- ion based on kernel partial least squares regression. The main interest resides in a simplified learning scheme as the obtained hidden features are automatically ranked according to their correlation with the target out- puts. The approach is illustrated on small scale real world applications and shows compelling evidences.

Proceedings Article
01 Jan 2011
TL;DR: Resilient backprop- agation is a method and it does not prevent divergence caused by the approximation bias and is combined with CD learning.
Abstract: Contrastive Divergence (CD) learning is frequently applied to Restricted Boltzmann Machines (RBMs), the building blocks of deep believe networks. It relies on biased approximations of the log-likelihood gradient. This bias can deteriorate the learning process. It was claimed that the signs of most components of the CD update are equal to the corresponding signs of the log-likelihood gradient. This suggests using optimization techniques only depending on the signs. Resilient backprop- agation is such a method and we combine it with CD learning. However, it does not prevent divergence caused by the approximation bias.