Showing papers on "Softmax function published in 2013"

PDF

Open Access

Proceedings Article•

Distributed Representations of Words and Phrases and their Compositionality

[...]

Tomas Mikolov¹, Ilya Sutskever¹, Kai Chen¹, Greg S. Corrado¹, Jeffrey Dean¹ - Show less +1 more•Institutions (1)

05 Dec 2013

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Abstract: The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

...read moreread less

24,012 citations

Posted Content•

Distributed Representations of Words and Phrases and their Compositionality

[...]

Tomas Mikolov¹, Ilya Sutskever¹, Kai Chen¹, Greg S. Corrado¹, Jeffrey Dean¹ - Show less +1 more•Institutions (1)

Google¹

16 Oct 2013-arXiv: Computation and Language

TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.

...read moreread less

11,343 citations

Posted Content•

Visualizing and Understanding Convolutional Networks

[...]

Matthew D. Zeiler¹, Rob Fergus¹•Institutions (1)

New York University¹

12 Nov 2013-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, the authors introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classifier, and perform an ablation study to discover the performance contribution from different model layers.

...read moreread less

Abstract: Large Convolutional Network models have recently demonstrated impressive classification performance on the ImageNet benchmark. However there is no clear understanding of why they perform so well, or how they might be improved. In this paper we address both issues. We introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classifier. We also perform an ablation study to discover the performance contribution from different model layers. This enables us to find model architectures that outperform Krizhevsky \etal on the ImageNet classification benchmark. We show our ImageNet model generalizes well to other datasets: when the softmax classifier is retrained, it convincingly beats the current state-of-the-art results on Caltech-101 and Caltech-256 datasets.

...read moreread less

2,982 citations

Posted Content•

Deep Learning using Linear Support Vector Machines

[...]

Yichuan Tang¹•Institutions (1)

University of Toronto¹

02 Jun 2013-arXiv: Learning

TL;DR: The results using L2-SVMs show that by simply replacing softmax with linear SVMs gives significant gains on popular deep learning datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop's face expression recognition challenge.

...read moreread less

Abstract: Recently, fully-connected and convolutional neural networks have been trained to achieve state-of-the-art performance on a wide variety of tasks such as speech recognition, image classification, natural language processing, and bioinformatics. For classification tasks, most of these "deep learning" models employ the softmax activation function for prediction and minimize cross-entropy loss. In this paper, we demonstrate a small but consistent advantage of replacing the softmax layer with a linear support vector machine. Learning minimizes a margin-based loss instead of the cross-entropy loss. While there have been various combinations of neural nets and SVMs in prior art, our results using L2-SVMs show that by simply replacing softmax with linear SVMs gives significant gains on popular deep learning datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop's face expression recognition challenge.

...read moreread less

760 citations

Posted Content•

Visualizing and Understanding Convolutional Neural Networks

[...]

Matthew D. Zeiler, Rob Fergus

12 Nov 2013

TL;DR: In this paper, a novel visualization technique was introduced to give insight into the function of intermediate feature layers and the operation of the classifier, which enabled the authors to find model architectures that outperformed Krizhevsky et al. on ImageNet classification benchmark.

...read moreread less

Abstract: Large Convolutional Neural Network models have recently demonstrated impressive classification performance on the ImageNet benchmark \cite{Kriz12}. However there is no clear understanding of why they perform so well, or how they might be improved. In this paper we address both issues. We introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classifier. We also perform an ablation study to discover the performance contribution from different model layers. This enables us to find model architectures that outperform Krizhevsky \etal on the ImageNet classification benchmark. We show our ImageNet model generalizes well to other datasets: when the softmax classifier is retrained, it convincingly beats the current state-of-the-art results on Caltech-101 and Caltech-256 datasets.

...read moreread less

513 citations

Proceedings Article•DOI•

Exploring convolutional neural network structures and optimization techniques for speech recognition.

[...]

Ossama Abdel-Hamid¹, Li Deng², Dong Yu²•Institutions (2)

York University¹, Microsoft²

25 Aug 2013

TL;DR: This paper investigates several CNN architectures, including full and limited weight sharing, convolution along frequency and time axes, and stacking of several convolution layers, and develops a novel weighted softmax pooling layer so that the size in the pooled layer can be automatically learned.

...read moreread less

Abstract: Recently, convolutional neural networks (CNNs) have been shown to outperform the standard fully connected deep neural networks within the hybrid deep neural network / hidden Markov model (DNN/HMM) framework on the phone recognition task. In this paper, we extend the earlier basic form of the CNN and explore it in multiple ways. We first investigate several CNN architectures, including full and limited weight sharing, convolution along frequency and time axes, and stacking of several convolution layers. We then develop a novel weighted softmax pooling layer so that the size in the pooling layer can be automatically learned. Further, we evaluate the effect of CNN pretraining, which is achieved by using a convolutional version of the RBM. We show that all CNN architectures we have investigated outperform the earlier basic form of the DNN on both the phone recognition and large vocabulary speech recognition tasks. The architecture with limited weight sharing provides additional gains over the full weight sharing architecture. The softmax pooling layer performs as well as the best CNN with the manually tuned fixed-pooling size, and has a potential for further improvement. Finally, we show that CNN pretraining produces significantly better results on a large vocabulary speech recognition task.

...read moreread less

378 citations

Journal Article•DOI•

The anatomy of choice: active inference and agency.

[...]

Karl J. Friston, Philipp Schwartenbeck, Thomas H. B. FitzGerald, Michael Moutoussis, Timothy E.J. Behrens¹, Raymond J. Dolan - Show less +2 more•Institutions (1)

University of Oxford¹

25 Sep 2013-Frontiers in Human Neuroscience

TL;DR: Variational Bayes is considered as an alternative scheme that provides formal constraints on the computational anatomy of inference and action—constraints that are remarkably consistent with neuroanatomy.

...read moreread less

Abstract: This paper considers agency in the setting of embodied or active inference. In brief, we associate a sense of agency with prior beliefs about action and ask what sorts of beliefs underlie optimal behaviour. In particular, we consider prior beliefs that action minimises the Kullback-Leibler divergence between desired states and attainable states in the future. This allows one to formulate bounded rationality as approximate Bayesian inference that optimises a free energy bound on model evidence. We show that constructs like expected utility, exploration bonuses, softmax choice rules and optimism bias emerge as natural consequences of this formulation. Previous accounts of active inference have focused on predictive coding and Bayesian filtering schemes for minimising free energy. Here, we consider variational Bayes as an alternative scheme that provides formal constraints on the computational anatomy of inference and action – constraints that are remarkably consistent with neuroanatomy. Furthermore, this scheme contextualises optimal decision theory and economic (utilitarian) formulations as pure inference problems. For example, expected utility theory emerges as a special case of free energy minimisation, where the sensitivity or inverse temperature (of softmax functions and quantal response equilibria) has a unique and Bayes-optimal solution – that minimises free energy. This sensitivity corresponds to the precision of beliefs about behaviour, such that attainable goals are afforded a higher precision or confidence. In turn, this means that optimal behaviour entails a representation of confidence about outcomes that are under an agent's control.

...read moreread less

270 citations

Journal Article•DOI•

Tensor Deep Stacking Networks

[...]

Brian Hutchinson¹, Li Deng², Dong Yu²•Institutions (2)

University of Washington¹, Microsoft²

01 Aug 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A sufficient depth of the T-DSN, a symmetry in the two hidden layers structure in each T- DSN block, the model parameter learning algorithm, and a softmax layer on top of T-DsN are shown to have all contributed to the low error rates observed in the experiments for all three tasks.

...read moreread less

Abstract: A novel deep architecture, the tensor deep stacking network (T-DSN), is presented. The T-DSN consists of multiple, stacked blocks, where each block contains a bilinear mapping from two hidden layers to the output layer, using a weight tensor to incorporate higher order statistics of the hidden binary (([0,1])) features. A learning algorithm for the T-DSN's weight matrices and tensors is developed and described in which the main parameter estimation burden is shifted to a convex subproblem with a closed-form solution. Using an efficient and scalable parallel implementation for CPU clusters, we train sets of T-DSNs in three popular tasks in increasing order of the data size: handwritten digit recognition using MNIST (60k), isolated state/phone classification and continuous phone recognition using TIMIT (1.1 m), and isolated phone classification using WSJ0 (5.2 m). Experimental results in all three tasks demonstrate the effectiveness of the T-DSN and the associated learning methods in a consistent manner. In particular, a sufficient depth of the T-DSN, a symmetry in the two hidden layers structure in each T-DSN block, our model parameter learning algorithm, and a softmax layer on top of T-DSN are shown to have all contributed to the low error rates observed in the experiments for all three tasks.

...read moreread less

164 citations

Proceedings Article•

Modeling documents with a Deep Boltzmann Machine

[...]

Nitish Srivastava¹, Ruslan Salakhutdinov¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

11 Aug 2013

TL;DR: A type of Deep Boltzmann Machine that is suitable for extracting distributed semantic representations from a large unstructured collection of documents is introduced and it is shown that the model assigns better log probability to unseen data than the Replicated Softmax model.

...read moreread less

Abstract: We introduce a type of Deep Boltzmann Machine (DBM) that is suitable for extracting distributed semantic representations from a large unstructured collection of documents. We overcome the apparent difficulty of training a DBM with judicious parameter tying. This enables an efficient pretraining algorithm and a state initialization scheme for fast inference. The model can be trained just as efficiently as a standard Restricted Boltzmann Machine. Our experiments show that the model assigns better log probability to unseen data than the Replicated Softmax model. Features extracted from our model outperform LDA, Replicated Softmax, and DocNADE models on document retrieval and document classification tasks.

...read moreread less

123 citations

Posted Content•

Modeling Documents with Deep Boltzmann Machines

[...]

Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey E. Hinton

26 Sep 2013-arXiv: Learning

TL;DR: A Deep Boltzmann Machine model suitable for modeling and extracting latent semantic representations from a large unstructured collection of documents is introduced and it is shown that the model assigns better log probability to unseen data than the Replicated Softmax model.

...read moreread less

Abstract: We introduce a Deep Boltzmann Machine model suitable for modeling and extracting latent semantic representations from a large unstructured collection of documents. We overcome the apparent difficulty of training a DBM with judicious parameter tying. This parameter tying enables an efficient pretraining algorithm and a state initialization scheme that aids inference. The model can be trained just as efficiently as a standard Restricted Boltzmann Machine. Our experiments show that the model assigns better log probability to unseen data than the Replicated Softmax model. Features extracted from our model outperform LDA, Replicated Softmax, and DocNADE models on document retrieval and document classification tasks.

...read moreread less

108 citations

Journal Article•DOI•

Type 1 and 2 mixtures of Kullback-Leibler divergences as cost functions in dimensionality reduction based on similarity preservation

[...]

John Aldo Lee¹, Emilie Renard¹, Guillaume Bernard¹, Pierre Dupont¹, Michel Verleysen¹ - Show less +1 more•Institutions (1)

Université catholique de Louvain¹

01 Jul 2013-Neurocomputing

TL;DR: This paper proposes a different mixture of KL divergences, which is a scaled version of the generalized Jensen-Shannon divergence, and shows experimentally that this divergence produces embeddings that better preserve small K-ary neighborhoods, as compared to both the single KL divergence used in SNE and t-SNE and the mixture used in NeRV.

...read moreread less

Proceedings Article•DOI•

Laser-based segment classification using a mixture of bag-of-words

[...]

Jens Behley¹, Volker Steinhage¹, Armin B. Cremers¹•Institutions (1)

University of Bonn¹

01 Nov 2013

TL;DR: A segment-based object detection approach using laser range data that combines multiple softmax regression classifiers learned on specific bag-of-word representations using different parameterizations of a descriptor to detect pedestrians, cars, and cyclists.

...read moreread less

Abstract: In this paper, we propose a segment-based object detection approach using laser range data. Our detection approach is built up of three stages: First, a hierarchical segmentation approach generates a hierarchy of coarse-to-fine segments to reduce the impact of over- and under-segmentation in later stages. Next, we employ a learned mixture model to classify all segments. The model combines multiple softmax regression classifiers learned on specific bag-of-word representations using different parameterizations of a descriptor. In the final stage, we filter irrelevant and duplicate detections using a greedy method in consideration of the segment hierarchy. We experimentally evaluate our approach on recently published real-world datasets to detect pedestrians, cars, and cyclists.

...read moreread less

Journal Article•DOI•

Modeling and Analysis of Driving Behavior Based on a Probability-Weighted ARX Model

[...]

Hiroyuki Okuda¹, Norimitsu Ikami¹, Tatsuya Suzuki¹, Yuichi Tazaki¹, Kazuya Takeda¹ - Show less +1 more•Institutions (1)

Nagoya University¹

01 Mar 2013-IEEE Transactions on Intelligent Transportation Systems

TL;DR: A probability-weighted autoregressive exogenous (PrARX) model wherein the multiple ARX models are composed of the probabilistic weighting functions, which can represent both the motion-control and decision-making aspects of the driving behavior.

...read moreread less

Abstract: This paper proposes a probability-weighted autoregressive exogenous (PrARX) model wherein the multiple ARX models are composed of the probabilistic weighting functions. This model can represent both the motion-control and decision-making aspects of the driving behavior. As the probabilistic weighting function, a “softmax” function is introduced. Then, the parameter estimation problem for the proposed model is formulated as a single optimization problem. The “soft” partition defined by the PrARX model can represent the decision-making characteristics of the driver with vagueness. This vagueness can be quantified by introducing the “decision entropy.” In addition, it can be easily extended to the online estimation scheme due to its small computational cost. Finally, the proposed model is applied to the modeling of the vehicle-following task, and the usefulness of the model is verified and discussed.

...read moreread less

Journal Article•DOI•

Integrating probabilistic models of perception and interactive neural networks: a historical and tutorial review.

[...]

James L. McClelland¹•Institutions (1)

Stanford University¹

20 Aug 2013-Frontiers in Psychology

TL;DR: It is shown how a new version of the IA model called the multinomial interactive activation (MIA) model can sample correctly from the joint posterior of a proposed generative model for perception of letters in words, indicating that interactive processing is fully consistent with principled probabilistic computation.

...read moreread less

Abstract: This article seeks to establish a rapprochement between explicitly Bayesian models of contextual effects in perception and neural network models of such effects, particularly the connectionist interactive activation (IA) model of perception. The article is in part an historical review and in part a tutorial, reviewing the probabilistic Bayesian approach to understanding perception and how it may be shaped by context, and also reviewing ideas about how such probabilistic computations may be carried out in neural networks, focusing on the role of context in interactive neural networks, in which both bottom-up and top-down signals affect the interpretation of sensory inputs. It is pointed out that connectionist units that use the logistic or softmax activation functions can exactly compute Bayesian posterior probabilities when the bias terms and connection weights affecting such units are set to the logarithms of appropriate probabilistic quantities. Bayesian concepts such the prior, likelihood, (joint and marginal) posterior, probability matching and maximizing, and calculating vs. sampling from the posterior are all reviewed and linked to neural network computations. Probabilistic and neural network models are explicitly linked to the concept of a probabilistic generative model that describes the relationship between the underlying target of perception (e.g., the word intended by a speaker or other source of sensory stimuli) and the sensory input that reaches the perceiver for use in inferring the underlying target. It is shown how a new version of the IA model called the multinomial interactive activation (MIA) model can sample correctly from the joint posterior of a proposed generative model for perception of letters in words, indicating that interactive processing is fully consistent with principled probabilistic computation. Ways in which these computations might be realized in real neural systems are also considered.

...read moreread less

Journal Article•DOI•

Time series forecasting using a weighted cross-validation evolutionary artificial neural network ensemble

[...]

Juan Peralta Donate¹, Paulo Cortez², German Gutierrez Sanchez¹, Araceli Sanchis de Miguel¹•Institutions (2)

Carlos III Health Institute¹, University of Minho²

01 Jun 2013-Neurocomputing

TL;DR: This work proposes a novel EANN approach, where a weighted n-fold validation fitness scheme is used to build an ensemble of neural networks, under four different combination methods: mean, median, softmax and rank-based.

...read moreread less

Journal Article•DOI•

Bayesian Multicategorical Soft Data Fusion for Human–Robot Collaboration

[...]

Nisar Ahmed¹, Eric Sample¹, Mark Campbell¹•Institutions (1)

Cornell University¹

01 Feb 2013-IEEE Transactions on Robotics

TL;DR: Experiments for hardware-based multitarget search missions with a cooperative human-autonomous robot team show that humans can serve as highly informative sensors through proper data modeling and fusion, and that VBIS provides reliable and scalable Bayesian fusion estimates via GMs.

...read moreread less

Abstract: This paper considers Bayesian data fusion of conventional robot sensor information with ambiguous human-generated categorical information about continuous world states of interest. First, it is shown that such soft information can be generally modeled via hybrid continuous-to-discrete likelihoods that are based on the softmax function. A new hybrid fusion procedure, called variational Bayesian importance sampling (VBIS), is then introduced to combine the strengths of variational Bayes approximations and fast Monte Carlo methods to produce reliable posterior estimates for Gaussian priors and softmax likelihoods. VBIS is then extended to more general fusion problems that involve complex Gaussian mixture (GM) priors and multimodal softmax likelihoods, leading to accurate GM approximations of highly non-Gaussian fusion posteriors for a wide range of robot sensor data and soft human data. Experiments for hardware-based multitarget search missions with a cooperative human-autonomous robot team show that humans can serve as highly informative sensors through proper data modeling and fusion, and that VBIS provides reliable and scalable Bayesian fusion estimates via GMs.

...read moreread less

Patent•

Multilingual deep neural network

[...]

Jui-Ting Huang¹, Jinyu Li¹, Dong Yu¹, Li Deng¹, Yifan Gong¹ - Show less +1 more•Institutions (1)

Microsoft¹

11 Mar 2013

TL;DR: In this article, various technologies pertaining to a multilingual deep neural network (MDNN) are described, wherein values for weight parameters of the plurality of hidden layers are learned during a training phase based upon training data in terms of acoustic raw features for multiple languages.

...read moreread less

Abstract: Described herein are various technologies pertaining to a multilingual deep neural network (MDNN). The MDNN includes a plurality of hidden layers, wherein values for weight parameters of the plurality of hidden layers are learned during a training phase based upon training data in terms of acoustic raw features for multiple languages. The MDNN further includes softmax layers that are trained for each target language separately, making use of the hidden layer values trained jointly with multiple source languages. The MDNN is adaptable, such that a new softmax layer may be added on top of the existing hidden layers, where the new softmax layer corresponds to a new target language.

...read moreread less

Journal Article•DOI•

Distributed multinomial regression

[...]

Matt Taddy

24 Nov 2013-arXiv: Applications

TL;DR: In this article, the authors introduce a model-based approach to distributed computing for multinomial logistic (softmax) regression, treating counts for each response category as independent Poisson regressions via plug-in estimates for fixed effects shared across categories.

...read moreread less

Abstract: This article introduces a model-based approach to distributed computing for multinomial logistic (softmax) regression. We treat counts for each response category as independent Poisson regressions via plug-in estimates for fixed effects shared across categories. The work is driven by the high-dimensional-response multinomial models that are used in analysis of a large number of random counts. Our motivating applications are in text analysis, where documents are tokenized and the token counts are modeled as arising from a multinomial dependent upon document attributes. We estimate such models for a publicly available data set of reviews from Yelp, with text regressed onto a large set of explanatory variables (user, business, and rating information). The fitted models serve as a basis for exploring the connection between words and variables of interest, for reducing dimension into supervised factor scores, and for prediction. We argue that the approach herein provides an attractive option for social scientists and other text analysts who wish to bring familiar regression tools to bear on text data.

...read moreread less

Journal Article•DOI•

Scaled free-energy based reinforcement learning for robust and efficient learning in high-dimensional state spaces

[...]

Stefan Elfwing¹, Eiji Uchibe¹, Kenji Doya¹•Institutions (1)

Okinawa Institute of Science and Technology¹

28 Feb 2013-Frontiers in Neurorobotics

TL;DR: This study proposes a scaled version of free-energy based reinforcement learning to achieve more robust and more efficient learning performance and tests the method's robustness with respect to different exploration schedules.

...read moreread less

Abstract: Free-energy based reinforcement learning (FERL) was proposed for learning in high-dimensional state- and action spaces, which cannot be handled by standard function approximation methods. In this study, we propose a scaled version of free-energy based reinforcement learning to achieve more robust and more efficient learning performance. The action-value function is approximated by the negative free-energy of a restricted Boltzmann machine, divided by a constant scaling factor that is related to the size of the Boltzmann machine (the square root of the number of state nodes in this study). Our first task is a digit floor gridworld task, where the states are represented by images of handwritten digits from the MNIST data set. The purpose of the task is to investigate the proposed method's ability, through the extraction of task-relevant features in the hidden layer, to cluster images of the same digit and to cluster images of different digits that corresponds to states with the same optimal action. We also test the method's robustness with respect to different exploration schedules, i.e., different settings of the initial temperature and the temperature discount rate in softmax action selection. Our second task is a robot visual navigation task, where the robot can learn its position by the different colors of the lower part of four landmarks and it can infer the correct corner goal area by the color of the upper part of the landmarks. The state space consists of binarized camera images with, at most, nine different colors, which is equal to 6642 binary states. For both tasks, the learning performance is compared with standard FERL and with function approximation where the action-value function is approximated by a two-layered feedforward neural network.

...read moreread less

Proceedings Article•

Relevance Topic Model for Unstructured Social Group Activity Recognition

[...]

Fang Zhao¹, Yongzhen Huang¹, Liang Wang¹, Tieniu Tan¹•Institutions (1)

Chinese Academy of Sciences¹

05 Dec 2013

TL;DR: A "relevance topic model" is proposed for jointly learning meaningful mid-level representations upon bag-of-words (BoW) video representations and a classifier with sparse weights that achieves state of the art performance and outperforms other supervised topic model in terms of classification accuracy.

...read moreread less

Abstract: Unstructured social group activity recognition in web videos is a challenging task due to 1) the semantic gap between class labels and low-level visual features and 2) the lack of labeled training data. To tackle this problem, we propose a "relevance topic model" for jointly learning meaningful mid-level representations upon bag-of-words (BoW) video representations and a classifier with sparse weights. In our approach, sparse Bayesian learning is incorporated into an undirected topic model (i.e., Replicated Softmax) to discover topics which are relevant to video classes and suitable for prediction. Rectified linear units are utilized to increase the expressive power of topics so as to explain better video data containing complex contents and make variational inference tractable for the proposed model. An efficient variational EM algorithm is presented for model parameter estimation and inference. Experimental results on the Unstructured Social Activity Attribute dataset show that our model achieves state of the art performance and outperforms other supervised topic model in terms of classification accuracy, particularly in the case of a very small number of labeled training videos.

...read moreread less

Posted Content•

Exact Inference in Networks with Discrete Children of Continuous Parents

[...]

Uri Lerner¹, Eran Segal¹, Daphne Koller¹•Institutions (1)

Stanford University¹

10 Jan 2013-arXiv: Artificial Intelligence

TL;DR: In this paper, the first exact inference algorithm for augmented conditional linear Gaussians (CLG) networks with continuous nodes and continuous children of continuous parents was proposed. But the algorithm is not exact in the sense that it computes the exact distributions over the discrete nodes, and the exact first and second moments of the continuous ones, up to the accuracy obtained by numerical integration.

...read moreread less

Abstract: Many real life domains contain a mixture of discrete and continuous variables and can be modeled as hybrid Bayesian Networks. Animportant subclass of hybrid BNs are conditional linear Gaussian (CLG) networks, where the conditional distribution of the continuous variables given an assignment to the discrete variables is a multivariate Gaussian. Lauritzen's extension to the clique tree algorithm can be used for exact inference in CLG networks. However, many domains also include discrete variables that depend on continuous ones, and CLG networks do not allow such dependencies to berepresented. No exact inference algorithm has been proposed for these enhanced CLG networks. In this paper, we generalize Lauritzen's algorithm, providing the first "exact" inference algorithm for augmented CLG networks - networks where continuous nodes are conditional linear Gaussians but that also allow discrete children ofcontinuous parents. Our algorithm is exact in the sense that it computes the exact distributions over the discrete nodes, and the exact first and second moments of the continuous ones, up to the accuracy obtained by numerical integration used within thealgorithm. When the discrete children are modeled with softmax CPDs (as is the case in many real world domains) the approximation of the continuous distributions using the first two moments is particularly accurate. Our algorithm is simple to implement and often comparable in its complexity to Lauritzen's algorithm. We show empirically that it achieves substantially higher accuracy than previous approximate algorithms.

...read moreread less

Book Chapter•DOI•

Softmax Regression for ECOC Reconstruction

[...]

Roberto D'Ambrosio¹, Giulio Iannello¹, Paolo Soda¹•Institutions (1)

Università Campus Bio-Medico¹

09 Sep 2013

TL;DR: A reconstruction rule based on softmax regression which considers the reconstruction task as a new classification problem and uses both the crisp labels and the reliabilities of binary decisions as second-order features.

...read moreread less

Abstract: Classification by binary decomposition is a well-known method to solve multiclass classification tasks since a large number of algorithms were designed for binary classification. Once the polychotomy has been decomposed into several dichotomies, the decisions of binary learners on a test sample are aggregated by a reconstruction rule to set the final multiclass label. In this context, this paper presents a reconstruction rule based on softmax regression which considers the reconstruction task as a new classification problem. To this aim, as second-order features we use both the crisp labels and the reliabilities of binary decisions. Six heterogeneous datasets and three different classification architectures have been used to test our method, whose performance favorably compare with those provided by other three reconstruction rules both in terms of global accuracy and geometric mean of accuracies.

...read moreread less

Proceedings Article•DOI•

Solving biomedical classification tasks by softmax reconstruction in ECOC framework

[...]

Roberto D'Ambrosio¹, Giulio Iannello¹, Paolo Soda¹•Institutions (1)

Università Campus Bio-Medico¹

20 Jun 2013

TL;DR: A reconstruction rule based on softmax regression, where the features of the new classification task are the crisp labels and the reliabilities of dichotomizers' classifications, which favorably compare with those provided by other two well-known reconstruction rules both in terms of global accuracy and accuracy per class.

...read moreread less

Abstract: Several medical and biological applications face with multiclass recognition problems. Such polychotomies can be addressed by decomposition techniques, which reduce the polychotomy into a series of dichotomies and then provide the final multiclass label using a reconstruction rule. Within this framework, we present a reconstruction rule based on softmax regression, where the features of the new classification task are the crisp labels and the reliabilities of dichotomizers' classifications. The approach has been tested on six medical and biological datasets, decomposing the polychotomies via the Error-Correcting Output Code. Its performances favorably compare with those provided by other two well-known reconstruction rules both in terms of global accuracy and accuracy per class.

...read moreread less

Posted Content•

NORM: Stata module to normalize variables

[...]

Muhammad Rashid Ansari, Chiara Mussida

20 Sep 2013-Research Papers in Economics

TL;DR: The normalized variables module generates normalized variables using z-score, min-max, softmax and sigmoid techniques and supports multiple variables and panel dataset.

...read moreread less

Abstract: norm generates normalized variables using z-score, min-max, softmax and sigmoid techniques. The module supports multiple variables and panel dataset.

...read moreread less

Journal Article•DOI•

Learning Facial Expression Codes with Sparse Auto-Encoder

[...]

De Kun Hu, Gui Duo Duan¹•Institutions (1)

University of Electronic Science and Technology of China¹

01 Oct 2013-Applied Mechanics and Materials

TL;DR: A sparse auto-encoder model was trained to extract the code of different facial expression, which comprises four encoder layers and three decode layers, the representation locating in the fourth layer (code layer) is the features expected.

...read moreread less

Abstract: A sparse auto-encoder model was trained to extract the code of different facial expression, which comprises four encoder layers and three decode layers, the representation locating in the fourth layer (code layer) is the features expected. With large amounts of patches randomly selected from training faces, the model was trained firstly via backpropagation which minimizes an unsupervised sparse reconstruction error, and then a softmax classifier was learned for supervised classification. The input vector for the classification is the feature of facial image induced by the learned sparse auto-encoder and two key operations (convolving and pooling). Using a small number of hidden units per layer and a relatively small number of training set, the proposed model achieves excellent performance in the experiments.

...read moreread less

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

[...]

Nitish Srivastava¹, Ruslan Salakhutdinov¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

26 Apr 2013

TL;DR: This work introduces a type of Deep Boltzmann Machine that is suitable for extracting distributed semantic representations from a large unstructured collection of documents and proposes an approximate inference method that interacts with learning in a way that makes it possible to train the DBM more eciently than previously proposed methods.

...read moreread less

Abstract: We introduce a type of Deep Boltzmann Machine (DBM) that is suitable for extracting distributed semantic representations from a large unstructured collection of documents. We propose an approximate inference method that interacts with learning in a way that makes it possible to train the DBM more eciently than previously proposed methods. Even though the model has two hidden layers, it can be trained just as eciently as a standard Restricted Boltzmann Machine. Our experiments show that the model assigns better log probability to unseen data than the Replicated Softmax model. Features extracted from our model outperform LDA, Replicated Softmax, and DocNADE models on document retrieval and document classication tasks.

...read moreread less