Showing papers presented at "The European Symposium on Artificial Neural Networks in 2011"

PDF

Open Access

Proceedings Article•

Using very deep autoencoders for content-based image retrieval.

[...]

Alex Krizhevsky¹, Geoffrey E. Hinton¹•Institutions (1)

01 Jan 2011

TL;DR: This work shows how to learn many layers of features on color images and how these features are used to initialize deep autoencoders, which are then used to map images to short binary codes.

...read moreread less

Abstract: We show how to learn many layers of features on color images and we use these features to initialize deep autoencoders. We then use the autoencoders to map images to short binary codes. Using semantic hashing [6], 28-bit codes can be used to retrieve images that are similar to a query image in a time that is independent of the size of the database. This extremely fast retrieval makes it possible to search using multiple di erent transformations of the query image. 256-bit binary codes allow much more accurate matching and can be used to prune the set of images found using the 28-bit codes.

...read moreread less

406 citations

Proceedings Article•

An Introduction to Deep Learning

[...]

Ludovic Arnold, Sébastien Rebecchi, Sylvain Chevallier, Hélène Paugam-Moisy

27 Apr 2011

TL;DR: The present tutorial introducing the ESANN deep learning special session details the state-of-the-art models and summarizes the current understanding of this learning approach which is a reference for many difficult classification tasks.

...read moreread less

Abstract: The deep learning paradigm tackles problems on which shallow architectures (e.g. SVM) are affected by the curse of dimensionality. As part of a two-stage learning scheme involving multiple layers of non-linear processing a set of statistically robust features is automatically extracted from the data. The present tutorial introducing the ESANN deep learning special session details the state-of-the-art models and summarizes the current understanding of this learning approach which is a reference for many difficult classification tasks.

...read moreread less

67 citations

Proceedings Article•

Comparison of the Complex Valued and Real Valued Neural Networks Trained with Gradient Descent and Random Search Algorithms

[...]

Hans-Georg Zimmermann¹, Alexey Minin¹, Victoria Kusherbaeva¹•Institutions (1)

Siemens¹

01 Jan 2011

TL;DR: The outcome of the current research is the combined global-local algorithm for training the complex valued feed forward neural network which is appropriate for the considered chaotic problem.

...read moreread less

Abstract: Complex Valued Neural Network is one of the open topics in the machine learning society. In this paper we will try to go through the problems of the complex valued neural networks gradients computations by combining the global and local optimization algorithms. The outcome of the current research is the combined global-local algorithm for training the complex valued feed forward neural network which is appropriate for the considered chaotic problem.

...read moreread less

50 citations

Proceedings Article•

Seeing is believing: the importance of visualization in real-world machine learning applications

[...]

Alfredo Vellido Alcacena¹, José D. Martín², Fabrice Rossi³, Paulo J. G. Lisboa⁴•Institutions (4)

Polytechnic University of Catalonia¹, University of Valencia², Télécom ParisTech³, Liverpool John Moores University⁴

01 Apr 2011

TL;DR: This tutorial prefaces the special session \Seeing is believing: The importance of visualization in real-world machine learning applications, reects some of the main emerging topics in the eld and provides some clues to the current state and the near future of visualization methods within the framework of Machine Learning.

...read moreread less

Abstract: The increasing availability of data sets with a huge amount of information, coded in many dierent features, justies the research on new methods of knowledge extraction: the great challenge is the translation of the raw data into useful information that can be used to improve decision- making processes, detect relevant proles, nd out relationships among features, etc. It is undoubtedly true that a picture is worth a thousand words, what makes visualization methods be likely the most appealing and one of the most relevant kinds of knowledge extration methods. At ESANN 2011, the special session \Seeing is believing: The importance of visualization in real-world machine learning applications" reects some of the main emerging topics in the eld. This tutorial prefaces the session, summarizing some of its contributions, while also providing some clues to the current state and the near future of visualization methods within the framework of Machine Learning.

...read moreread less

41 citations

Proceedings Article•

General bound of overfitting for MLP regression models.

[...]

Joseph Rynkiewicz¹•Institutions (1)

University of Paris¹

01 Jan 2011

TL;DR: In this article, a universal bound for the overfitting of a multilayer perceptrons model under weak assumptions is presented, this bound is valid without Gaussian or identifiability assumptions and it is used to give a hint about determining the true architecture of the MLP model when the number of data goes to infinite.

...read moreread less

Abstract: Multilayer perceptrons (MLP) with one hidden layer have been used for a long time to deal with non-linear regression However, in some task, MLP's are too powerful models and a small mean square error (MSE) may be more due to overfitting than to actual modeling If the noise of the regression model is Gaussian, the overfitting of the model is totally determined by the behavior of the likelihood ratio test statistic (LRTS), however in numerous cases the assumption of normality of the noise is arbitrary if not false In this paper, we present an universal bound for the overfitting of such model under weak assumptions, this bound is valid without Gaussian or identifiability assumptions The main application of this bound is to give a hint about determining the true architecture of the MLP model when the number of data goes to infinite As an illustration, we use this theoretical result to propose and compare effective criteria to find the true architecture of an MLP

...read moreread less

31 citations

Proceedings Article•

Multi-class classification in the presence of labelling errors.

[...]

Jakramate Bootkrajang¹, Ata Kaban¹•Institutions (1)

University of Birmingham¹

01 Jan 2011

TL;DR: A model-based approach is presented that extends multi-class quadratic normal discrim- inant analysis with a model of the mislabelling process, demonstrating the benefits in terms of parameter recovery as well as classification performance, on both synthetic and real-world multi- class problems.

...read moreread less

Abstract: Learning a classifier from a training set that contains labelling errors is a difficult, yet not very well studied problem. Here we present a model-based approach that extends multi-class quadratic normal discrim- inant analysis with a model of the mislabelling process. We demonstrate the benefits of this approach in terms of parameter recovery as well as im- proved classification performance, on both synthetic and real-world multi- class problems. We also obtain enhanced accuracy in comparison with a previous model-free approach.

...read moreread less

24 citations

Proceedings Article•

Application of stochastic recurrent reinforcement learning to index trading

[...]

Denise Gorse¹•Institutions (1)

University College London¹

01 Apr 2011

TL;DR: It is demonstrated that RRL can reliably outperform buy-and-hold for the higher frequency data, in contrast to GP which performed best for monthly data.

...read moreread less

Abstract: A novel stochastic adaptation of the recurrent reinforcement learning (RRL) methodology is applied to daily, weekly, and monthly stock index data, and compared to results obtained elsewhere using genetic programming (GP). The data sets used have been a considered a challenging test for algorithmic trading. It is demonstrated that RRL can reliably outperform buy-and-hold for the higher frequency data, in contrast to GP which performed best for monthly data.

...read moreread less

23 citations

Proceedings Article•

Sparse LS-SVMs with L0-norm minimization

[...]

Jorge López Lázaro¹, Kris De Brabanter², José R. Dorronsoro¹, Johan A. K. Suykens³•Institutions (3)

Autonomous University of Madrid¹, Katholieke Universiteit Leuven², Wuhan University of Science and Technology³

01 Jan 2011

TL;DR: This paper adapts to the LS-SVM case a recent work for sparsifying classical SVM classifiers, which is based on an iterative approximation to the L0-norm, and achieves very sparse models, without significant loss of accuracy compared to standard LS- SVMs or SVMs.

...read moreread less

Abstract: This is an electronic version of the paper presented at the 19th European Symposium on Artificial Neural Networks, held in Bruges on 2011

...read moreread less

20 citations

Proceedings Article•

Class-Specific Feature Selection for One-Against-All Multiclass SVMs

[...]

Gaël de Lannoy¹, Damien François¹, Michel Verleysen¹•Institutions (1)

Université catholique de Louvain¹

01 Jan 2011

TL;DR: A normalization of the binary classiers outputs that allows fair comparisons in such cases of biased comparison may be biased towards one specic class when thebinary classiers are built on distinct feature subsets is proposed.

...read moreread less

Abstract: This paper proposes a method to perform class-specic fea- ture selection in multiclass support vector machines addressed with the one-against-all strategy. The main issue arises at the nal step of the classication process, where binary classier outputs must be compared one against another to elect the winning class. This comparison may be biased towards one specic class when the binary classiers are built on distinct feature subsets. This paper proposes a normalization of the binary classiers outputs that allows fair comparisons in such cases.

...read moreread less

20 citations

Proceedings Article•

Hierarchical clustering for graph visualization

[...]

Stéphan Clémençon, Hector de Arazoza, Fabrice Rossi, Viet Chi Tran¹•Institutions (1)

École Polytechnique¹

27 Apr 2011

TL;DR: In this paper, a graph visualization methodology based on hierarchical maximal modularity clustering is described, with interactive and signicant coarsening and rening possibilities, for HIV epidemic analysis in Cuba.

...read moreread less

Abstract: This paper describes a graph visualization methodology based on hierarchical maximal modularity clustering, with interactive and signicant coarsening and rening possibilities. An application of this method to HIV epidemic analysis in Cuba is outlined.

...read moreread less

18 citations

Proceedings Article•

Maximal Discrepancy Vs. Rademacher Complexity for Error Estimation

[...]

Davide Anguita¹, Alessandro Ghio¹, Luca Oneto¹, Sandro Ridella¹•Institutions (1)

University of Genoa¹

01 Jan 2011

TL;DR: The different behavior of the Maximal Discrepancy and the Rademacher Complexity when applied to linear classifiers is studied and a practical procedure to tighten the bounds is suggested.

...read moreread less

Abstract: The Maximal Discrepancy and the Rademacher Complexity are powerful statistical tools that can be exploited to obtain reliable, albeit not tight, upper bounds of the generalization error of a classifier. We study the different behavior of the two methods when applied to linear classifiers and suggest a practical procedure to tighten the bounds. The resulting generalization estimation can be succesfully used for classifier model selection.

...read moreread less

Proceedings Article•

Reservoir regularization stabilizes learning of Echo State Networks with output feedback

[...]

René Felix Reinhart¹, Jochen J. Steil¹•Institutions (1)

Bielefeld University¹

01 Jan 2011

TL;DR: It is shown that regularization of the inner reservoir network mitigates pa- rameter dependencies and boosts the task-specific performance.

...read moreread less

Abstract: Output feedback is crucial for autonomous and parameter- ized pattern generation with reservoir networks. Read-out learning can lead to error amplification in these settings and therefore regularization is important for both generalization and reduction of error amplification. We show that regularization of the inner reservoir network mitigates pa- rameter dependencies and boosts the task-specific performance.

...read moreread less

Proceedings Article•

Single-trial P300 detection with Kalman ltering and SVMs

[...]

Lucie Daubigney¹, Lucie Daubigney², Olivier Pietquin², Olivier Pietquin¹•Institutions (2)

Georgia Tech Lorraine¹, Supélec²

27 Apr 2011

TL;DR: A novel approach to P300 detection using Kalman ltering and SVMs is proposed and experiments show that this method is a promising step toward single-trial detection of P300.

...read moreread less

Abstract: Brain Computer Interfaces (BCI) are systems enabling hu- mans to communicate with machines through signals generated by the brain. Several kinds of signals can be envisioned as well as means to mea- sure them. In this paper we are particularly interested in even-related brain potentials (ERP) and especially visually-evoked potential signals (P300) measured with surface electroencephalograms (EEG). When the human is stimulated with visual inputs, the P300 signals arise about 300 ms after the visual stimulus has been received. Yet, the EEG signal is often very noisy which makes the P300 detection hard. It is customary to use an average of several trials to enhance the P300 signal and reduce the random noise but this results in a lower bit rate of the interface. In this contribution, we propose a novel approach to P300 detection using Kalman ltering and SVMs. Experiments show that this method is a promising step toward single-trial detection of P300.

...read moreread less

Proceedings Article•

A constraint-based approach to incorporate prior knowledge in causal models

[...]

Giorgos Borboudakis¹, Sofia Triantafilou¹, Vincenzo Lagani¹, Ioannis Tsamardinos¹•Institutions (1)

Foundation for Research & Technology – Hellas¹

01 Jan 2011

TL;DR: The formalism of Maximal Ancestral Graphs (MAGs) is used and cSAT+ is adapted to solve the problem of incorporating prior knowledge, in the form of causal relations, in causal models.

...read moreread less

Abstract: In this paper we address the problem of incorporating prior knowledge, in the form of causal relations, in causal models. Prior ap- proaches mostly consider knowledge about the presence or absence of edges in the model. We use the formalism of Maximal Ancestral Graphs (MAGs) and adapt cSAT+ to solve this problem, an algorithm for reasoning with datasets defined over different variable sets.

...read moreread less

Proceedings Article•

Statistical dependence measure for feature selection in microarray datasets.

[...]

Verónica Bolón-Canedo¹, Sohan Seth², Noelia Sánchez-Maroño¹, Amparo Alonso-Betanzos¹, Jose C. Principe² - Show less +1 more•Institutions (2)

University of A Coruña¹, University of Florida²

01 Jan 2011

TL;DR: A statistical dependence measure is presented for variable selection in the context of classification and its performance is tested over DNA microarray data, a challenging dataset for machine learning researchers due to the high number of genes and relatively small number of measurements.

...read moreread less

Abstract: Feature selection is the domain of machine learning which studies data-driven methods to select, among a set of input variables, the ones that will lead to the most accurate predictive model. In this paper, a statistical dependence measure is presented for variable selection in the context of classification. Its performance is tested over DNA microarray data, a challenging dataset for machine learning researchers due to the high number of genes and relatively small number of measurements. This measure is compared against the so called mRMR approach, and is shown to obtain better or equal performance over the binary datasets.

...read moreread less

Proceedings Article•

Clustering data streams with weightless neural networks

[...]

Douglas O. Cardoso¹, Priscila M. V. Lima², Massimo De Gregorio, João Gama³, Felipe M. G. França¹ - Show less +1 more•Institutions (3)

Federal University of Rio de Janeiro¹, Universidade Federal Rural do Rio de Janeiro², University of Porto³

01 Jan 2011

TL;DR: This work introduces a simple generalization of RAM-based neurons in order to explore both weightless neural models in the data stream clustering problem.

...read moreread less

Abstract: Producing good quality clustering of data streams in real time is a difficult problem, since it is necessary to perform the analysis of data points arriving in a continuous style, with the support of quite limited computational resources. The incremental and evolving nature of the resulting clustering structures must reflect the dynamics of the target data stream. The WiSARD weightless perceptron, and its associated DRASiW extension, are intrinsically capable of, respectively, performing one-shot learning and producing prototypes of the learnt categories. This work introduces a simple generalization of RAM-based neurons in order to explore both weightless neural models in the data stream clustering problem.

...read moreread less

Proceedings Article•

Multi-Goal Path Planning Using Self-Organizing Map with Navigation Functions

[...]

Jan Faigl¹, Jan Macák•Institutions (1)

Czech Technical University in Prague¹

01 Jan 2011

TL;DR: A combination of Self-Organizing Map approach and navigation functions in the Traveling Salesman Prob- lem with segment goals where paths between goals have to respect obsta- cles demonstrates applicability of SOM principles in such problems in which SOM has not yet been applied.

...read moreread less

Abstract: This paper presents a combination of Self-Organizing Map (SOM) approach and navigation functions in the Traveling Salesman Prob- lem with segment goals where paths between goals have to respect obsta- cles. Hence, the problem is called multi-goal path planning. The problem arises from the inspection planning, where a path from which all points of the given polygonal environment have to be "seen". The proposed ap- proach demonstrates applicability of SOM principles in such problems in which SOM has not yet been applied.

...read moreread less

Proceedings Article•

The role of Fisher information in primary data space for neighbourhood mapping

[...]

Héctor Ruiz¹, Ian H. Jarman¹, José D. Martín-Guerrero², Paulo J. G. Lisboa¹•Institutions (2)

Liverpool John Moores University¹, University of Valencia²

01 Jan 2011

TL;DR: This paper reviews the application of Fisher information to derive a metric in primary data space to provide a natural coordinate space to represent pairwise distances with respect to a probability distribution p(c|x), defined by an external label c, and use it to compute more informative distances.

...read moreread less

Abstract: Clustering methods and nearest neighbour classifiers typically compute distances between data points as a measure of similarity, with nearby pairs of points considered more like each other than remote pairs. The distance measure of choice is often Euclidean, implicitly treating all directions in space as equally relevant. This paper reviews the application of Fisher information to derive a metric in primary data space. The aim is to provide a natural coordinate space to represent pairwise distances with respect to a probability distribution p(c|x), defined by an external label c, and use it to compute more informative distances.

...read moreread less

Proceedings Article•

Mathematical Foundations of the Self Organized Neighbor Embedding (SONE) for Dimension Reduction and Visualization

[...]

Kerstin Bunte, Frank-Michael Schleif¹, Sven Haase, Thomas Villmann•Institutions (1)

Citec¹

01 Jan 2011

TL;DR: The authors connected a computational approach to topology learning, the Explo- ration Observation Machine (XOM) as introduced in (12), with the divergence optimization of SNE, resulting in a new dimension reduction algorithm called Neighbor Embedding XOM (NE- XOM).

...read moreread less

Abstract: In this paper we propose the generalization of the recently introduced Neighbor Embedding Exploratory Observation Machine (NE- XOM) for dimension reduction and visualization. We provide a gen- eral mathematical framework called Self Organized Neighbor Embedding (SONE). It treats the components, like data similarity measures and neigh- borhood functions, independently and easily changeable. And it enables the utilization of different divergences, based on the theory of Frechet derivatives. In this way we propose a new dimension reduction and visual- ization algorithm, which can be easily adapted to the user specific request and the actual problem. Various dimension reduction techniques have been introduced based on different properties of the original data to be preserved. The spectrum ranges from linear projections of original data, such as in Principal Component Analysis (PCA) or classical Multidimensional Scaling (MDS) to a wide range of locally linear and non-linear approaches, such as Isomap, Locally Linear Embedding (LLE), Local Linear Coordination (LLC), or charting. Stochastic Neighbor Embed- ding (SNE)approximates the probability distribution in the high-dimensional space, defined by neighboring points, with their probability distribution in a lower-dimensional space. A technique called t-SNE was proposed in (10). It is a variation of SNE considering another statistical model assumption for data distributions. Other methods aim at the preservation of the classification ac- curacy in lower dimensions and incorporate the available label information for the embedding, e. g. Linear Discriminant Analysis (LDA) (5) and generaliza- tions thereof and extensions of the Self Organizing Map (SOM) incorporating class labels. For a comprehensive review on nonlinear dimensionality reduction methods, we refer to (7). Recently, the idea of fast and efficient online learning was combined with the high-quality of divergence based optimization, resulting in a new dimension reduction algorithm called Neighbor Embedding XOM (NE- XOM). Its usefulness and comparison with other methods is shown in (3). The authors connected a computational approach to topology learning, the Explo- ration Observation Machine (XOM) as introduced in (12), with the divergence optimization of SNE. In this contribution, we extend the approach proposed in (3), with a mathematical foundation for the generalization of the principle to

...read moreread less

Proceedings Article•

A post-processing strategy for SVM learning from unbalanced data

[...]

Haydemar Núñez Castro, Luis González Abril¹, Cecilio Angulo Bahón•Institutions (1)

Polytechnic University of Catalonia¹

01 Jan 2011

TL;DR: Empirical results from experiments for a learned SVM model on twelve UCI datasets indicates that the proposed solution improves the original SVM, and they also improve those reported when using a z-SVM, in terms of g-mean and sensitivity.

...read moreread less

Abstract: Standard learning algorithms may perform poorly when learning from unbalanced datasets. Based on the Fisher’s discriminant analysis, a post-processing strategy is introduced to deal datasets with significant imbalance in the data distribution. A new bias is defined, which reduces skew towards the minority class. Empirical results from experiments for a learned SVM model on twelve UCI datasets indicates that the proposed solution improves the original SVM, and they also improve those reported when using a z-SVM, in terms of g-mean and sensitivity.

...read moreread less

Proceedings Article•

Generalized functional relevance Learning Vector Quantization

[...]

Marika Kästner, Barbara Hammer¹, Michael Biehl, Thomas Villmann²•Institutions (2)

Bielefeld University¹, Leipzig University²

01 Jan 2011

TL;DR: This paper demonstrates the ability of the new algorithms for standard functional data sets using different basis functions, namely Gaussians and Lorentzians to be adapted for relevance learning.

...read moreread less

Abstract: Generalized learning vector quantization (GRLVQ) is a prototype based classification algorithm with metric adaptation weighting each data dimensions according to their relevance for the classification task. We present in this paper an extension for functional data, which are usually very high dimensional. This approach supposes the data vectors have to be functional representations. Taking into account, these information the so-called relevance profile are modeled by superposition of simple basis functions depending on only a few parameters. As a consequence, the resulting functional GRLVQ has drastically reduced number of parameters to be adapted for relevance learning. We demonstrate the ability of the new algorithms for standard functional data sets using different basis functions, namely Gaussians and Lorentzians.

...read moreread less

Proceedings Article•

Hybrid HMM and HCRF model for sequence classification

[...]

Yann Soullard, Thierry Artières

01 Apr 2011

TL;DR: A hybrid model combining a generative model and a discriminative model for signal labelling and classification tasks, aiming at taking the best from each world is proposed, taking advantage of the usual increased accuracy of generative models on small training datasets and of discriminatives on large training datasets.

...read moreread less

Abstract: We propose a hybrid model combining a generative model and a discriminative model for signal labelling and classification tasks, aiming at taking the best from each world. The idea is to focus the learning of the discriminative model on most likely state sequences as output by the generative model. This allows taking advantage of the usual increased accuracy of generative models on small training datasets and of discriminative models on large training datasets. We instantiate this framework with Hidden Markov Models and Hidden Conditional Random Fields. We validate our model on financial time series and on handwriting data.

...read moreread less

Proceedings Article•

Multivariate class labeling in Robust Soft LVQ

[...]

Petra Schneider¹, Tina Geweniger², Frank-Michael Schleif³, Michael Biehl, Thomas Villmann² - Show less +1 more•Institutions (3)

University of Groningen¹, Leipzig University², Citec³

01 Jan 2011

TL;DR: Probabilistic LVQ (PLVQ) allows to re- alize multivariate class memberships for prototypes and training samples, and the prototype labels can be learned from the data during training.

...read moreread less

Abstract: We introduce a generalization of Robust Soft Learning Vector Quantization (RSLVQ). This algorithm for nearest prototype classification is derived from an explicit cost function and follows the dynamics of a stochastic gradient ascent. We generalize the RSLVQ cost function with respect to vectorial class labels: Probabilistic LVQ (PLVQ) allows to re- alize multivariate class memberships for prototypes and training samples, and the prototype labels can be learned from the data during training. We present experiments to demonstrate the new algorithm in practice.

...read moreread less

Proceedings Article•

Time Experiencing by Robotic Agents.

[...]

Michail Maniadakis¹, Marc Wittmann², Panos Trahanias¹•Institutions (2)

Foundation for Research & Technology – Hellas¹, Ludwig Maximilian University of Munich²

01 Jan 2011

TL;DR: An evolutionary process is employed to design a neural network controller capable of accomplishing both versions of a robotic rule switching task and revealed a self-organized time perceiving capacity in the agent's cognitive system that signicantly facilitates the accomplishment of tasks, through modulation of the supplementary behavioural and cognitive processes.

...read moreread less

Abstract: Biological organisms perceive and act in the world based on spatiotemporal experiences and interpretations. However, articial agents consider mainly the spatial relationships that exist in the world, typically ignoring its temporal aspects. In an attempt to direct research interest towards the fundamental issue of time experiencing, the current work ex- plores two temporally dierent versions of a robotic rule switching task. An evolutionary process is employed to design a neural network controller capable of accomplishing both versions of the task. The systematic explo- ration of neural network dynamics revealed a self-organized time percep- tion capacity in the agent's cognitive system that signicantly facilitates the accomplishment of tasks, through modulation of the supplementary behavioural and cognitive processes.

...read moreread less

Proceedings Article•

Anticipating Rewards in Continuous Time and Space with Echo State Networks and Actor-Critic Design.

[...]

Mohamed Oubbati¹•Institutions (1)

University of Ulm¹

01 Jan 2011

TL;DR: An echo state network is implemented within the concept of actor-critic design to obtain optimal control policy for a mobile robot to anticipate future rewards/punishments and react accordingly.

...read moreread less

Abstract: In this paper we implement an echo state network within the concept of actor-critic design to obtain optimal control policy for a mobile robot. The robot is asked to anticipate future rewards/punishments and react accordingly. Experimen- tal results show that the proposed approach is simple and effective.

...read moreread less

Proceedings Article•

Classifying mental states with machine learning algorithms using alpha activity decline.

[...]

Carina Walter¹, Gabriele Cierniak², Peter Gerjets², Wolfgang Rosenstiel¹, Martin Bogdan³ - Show less +1 more•Institutions (3)

University of Tübingen¹, Media Research Center², Leipzig University³

01 Jan 2011

TL;DR: The results indicate that continuous EEG-data combined with BCI methodology is a promising approach to measuring learners' mental states online.

...read moreread less

Abstract: This publication aims at developing computer based learning environments adapting to learners' individual cognitive condition. The adaptive mechanism, based on Brain-Computer-Interface (BCI) methodology, relays on electroencephalogram (EEG)-data to diagnose learners' mental states. A first within-subjects study (10 students) was accomplished aiming at differentiating between states of learning and non-learning by means of EEG-data. Support- Vector-Machines classified characteristics in the EEG-signals for these two different stimuli on average as 74.55% correct. For individual students the percentage of correct classification reached 92.22%. The results indicate that continuous EEG-data combined with BCI methodology is a promising approach to measuring learners' mental states online.

...read moreread less

Proceedings Article•

Mutual information based feature selection for mixed data

[...]

Gauthier Doquire¹, Michel Verleysen²•Institutions (2)

Université catholique de Louvain¹, Catholic University of Leuven²

01 Jan 2011

TL;DR: This paper proposes an approach based on mutual information and the maximal Relevance minimal Redundancy principle to handle the case of mixed data and combines aspects of both wrapper and filter methods and is well suited for regression problems.

...read moreread less

Abstract: The problem of feature selection is crucial for many applica- tions and has thus been studied extensively. However, most of the existing methods are designed to handle data consisting only in categorical or in real-valued features while a mix of both kinds of features is often en- countered in practice. This paper proposes an approach based on mutual information and the maximal Relevance minimal Redundancy principle to handle the case of mixed data. It combines aspects of both wrapper and filter methods and is well suited for regression problems. Experiments on artificial and real-world datasets show the interest of the methodology.

...read moreread less

Proceedings Article•

Learning of Causal Relations

[...]

John Quinn¹, Joris M. Mooij², Tom Heskes³, Michael Biehl•Institutions (3)

Makerere University¹, Max Planck Society², Radboud University Nijmegen³

01 Jan 2011

TL;DR: A practical description of how a learning task can be undertaken based on dierent possible assumptions is given, and two categories of assumptions lead to dierent methods, constraint-based and Bayesian learning.

...read moreread less

Abstract: To learn about causal relations between variables just by observing samples from them, particular assumptions must be made about those variables' distributions. This article gives a practical description of how such a learning task can be undertaken based on dierent possible assumptions. Two categories of assumptions lead to dierent methods, constraint-based and Bayesian learning, and in each case we review both the basic ideas and some recent extensions and alternatives to them.

...read moreread less

Proceedings Article•

A supervised strategy for deep kernel machine

[...]

Florian Yger¹, Maxime Berar¹, Gilles Gasso², Alain Rakotomamonjy¹•Institutions (2)

University of Rouen¹, Intelligence and National Security Alliance²

27 Apr 2011

TL;DR: In this paper, the hidden layers are learnt in a supervised fash- ion based on kernel partial least squares regression as the obtained hidden features are automatically ranked according to their correlation with the target out- puts.

...read moreread less

Abstract: This paper presents an alternative to the supervised KPCA based approach for learning a Multilayer Kernel Machine (MKM) (1). In our proposed procedure, the hidden layers are learnt in a supervised fash- ion based on kernel partial least squares regression. The main interest resides in a simplified learning scheme as the obtained hidden features are automatically ranked according to their correlation with the target out- puts. The approach is illustrated on small scale real world applications and shows compelling evidences.

...read moreread less

Proceedings Article•

Training RBMs Based on the Signs of the CD Approximation of the Log-likelihood Derivatives

[...]

Asja Fischer¹, Christian Igel²•Institutions (2)

Ruhr University Bochum¹, University of Copenhagen Faculty of Science²

01 Jan 2011

TL;DR: Resilient backprop- agation is a method and it does not prevent divergence caused by the approximation bias and is combined with CD learning.

...read moreread less

Abstract: Contrastive Divergence (CD) learning is frequently applied to Restricted Boltzmann Machines (RBMs), the building blocks of deep believe networks. It relies on biased approximations of the log-likelihood gradient. This bias can deteriorate the learning process. It was claimed that the signs of most components of the CD update are equal to the corresponding signs of the log-likelihood gradient. This suggests using optimization techniques only depending on the signs. Resilient backprop- agation is such a method and we combine it with CD learning. However, it does not prevent divergence caused by the approximation bias.

...read moreread less