scispace - formally typeset
Search or ask a question

Showing papers on "Softmax function published in 2001"


Proceedings Article
02 Aug 2001
TL;DR: The algorithm is based on Lauritzen's algorithm, and is exact in a similar sense: it computes the exact distributions over the discrete nodes, and the exact first and second moments of the continuous ones, up to inaccuracies resulting from numerical integration used within the algorithm.
Abstract: Many real life domains contain a mixture of discrete and continuous variables and can be modeled as hybrid Bayesian Networks (BNs). An important subclass of hybrid BNs are conditional linear Gaussian (CLG) networks, where the conditional distribution of the continuous variables given an assignment to the discrete variables is a multivariate Gaussian. Lauritzen's extension to the clique tree algorithm can be used for exact inference in CLG networks. However, many domains include discrete variables that depend on continuous ones, and CLG networks do not allow such dependencies to be represented. In this paper, we propose the first "exact" inference algorithm for augmented CLG networks -- CLG networks augmented by allowing discrete children of continuous parents. Our algorithm is based on Lauritzen's algorithm, and is exact in a similar sense: it computes the exact distributions over the discrete nodes, and the exact first and second moments of the continuous ones, up to inaccuracies resulting from numerical integration used within the algorithm. In the special case of softmax CPDs, we show that integration can often be done efficiently, and that using the first two moments leads to a particularly accurate approximation. We show empirically that our algorithm achieves substantially higher accuracy at lower cost than previous algorithms for this task.

73 citations


Book ChapterDOI
04 Jun 2001
TL;DR: A theoretically established approach tailored to reinforcement learning following Softmax action selection policy will be shown and an application example of agent-based routing will also be illustrated.
Abstract: In this paper a combined use of reinforcement learning and simulated annealing is treated. Most of the simulated annealing methods suggest using heuristic temperature bounds as the basis of annealing. Here a theoretically established approach tailored to reinforcement learning following Softmax action selection policy will be shown. An application example of agent-based routing will also be illustrated.

8 citations


Proceedings ArticleDOI
07 May 2001
TL;DR: A new learning algorithm for the supervised training of multilayer perceptions for classification that is significantly faster than any previously known method is presented.
Abstract: We present a new learning algorithm for the supervised training of multilayer perceptions for classification that is significantly faster than any previously known method. Like existing methods, the algorithm assumes a multilayer perceptron with a normalized exponential (softmax) output trained under a cross-entropy criterion. However, this output-criteria pairing turns out to have poor properties for existing optimization methods (backpropagation and its second order extensions) because second-order expansion of the network weights about the optimal solution is not a good approximation. The proposed algorithm overcomes this limitation by defining a new search space for which a second-order expansion is valid and such that the optimal solution in the new space coincides with the original criterion. This allows the application of the Levenberg-Marquardt search procedure to the cross-entropy criterion, which was previously thought applicable only to a mean square error criteria.

5 citations


Proceedings ArticleDOI
19 Jan 2001
TL;DR: The problem of identifying terrains in Landsat-TM images on the basis of non-uniformly distributed labeled data is discussed in this paper and the approach is based on the use of neural network classifiers that learn to predict posterior class probabilities.
Abstract: The problem of identifying terrains in Landsat-TM images on the basis of non-uniformly distributed labeled data is discussed in this paper. Our approach is based on the use of neural network classifiers that learn to predict posterior class probabilities. Principal Component Analysis (PCA) is used to extract features from spectral and contextual information. The proposed scheme obtains lower error rates that other model-based approaches.

2 citations


Posted Content
TL;DR: Fischer et al. as mentioned in this paper presented a novel class of neural spatial interaction models that incorporate origin-specific constraints into the model structure using product units rather than summation units at the hidden layer and softmax output units in the output layer.
Abstract: Fundamental to regional science is the subject of spatial interaction. GeoComputation - a new research paradigm that represents the convergence of the disciplines of computer science, geographic information science, mathematics and statistics - has brought many scholars back to spatial interaction modeling. Neural spatial interaction modeling represents a clear break with traditional methods used for explicating spatial interaction. Neural spatial interaction models are termed neural in the sense that they are based on neurocomputing. They are clearly related to conventional unconstrained spatial interaction models of the gravity type, and under commonly met conditions they can be understood as a special class of general feedforward neural network models with a single hidden layer and sigmoidal transfer functions (Fischer 1998). These models have been used to model journey-to-work flows and telecommunications traffic (Fischer and Gopal 1994, Openshaw 1993). They appear to provide superior levels of performance when compared with unconstrained conventional models. In many practical situations, however, we have - in addition to the spatial interaction data itself - some information about various accounting constraints on the predicted flows. In principle, there are two ways to incorporate accounting constraints in neural spatial interaction modeling. The required constraint properties can be built into the post-processing stage, or they can be built directly into the model structure. While the first way is relatively straightforward, it suffers from the disadvantage of being inefficient. It will also result in a model which does not inherently respect the constraints. Thus we follow the second way. In this paper we present a novel class of neural spatial interaction models that incorporate origin-specific constraints into the model structure using product units rather than summation units at the hidden layer and softmax output units at the output layer. Product unit neural networks are powerful because of their ability to handle higher order combinations of inputs. But parameter estimation by standard techniques such as the gradient descent technique may be difficult. The performance of this novel class of spatial interaction models will be demonstrated by using the Austrian interregional traffic data and the conventional singly constrained spatial interaction model of the gravity type as benchmark. References Fischer M M (1998) Computational neural networks: A new paradigm for spatial analysis Environment and Planning A 30 (10): 1873-1891 Fischer M M, Gopal S (1994) Artificial neural networks: A new approach to modelling interregional telecommunciation flows, Journal of Regional Science 34(4): 503-527 Openshaw S (1993) Modelling spatial interaction using a neural net. In Fischer MM, Nijkamp P (eds) Geographical information systems, spatial modelling, and policy evaluation, pp. 147-164. Springer, Berlin

2 citations


Proceedings ArticleDOI
07 May 2001
TL;DR: It is shown that this can be done by introducing a continuous feature which is used to calculate the probability of observing the different states of the hidden variable and the proposed model gives an improvement over a standard HMM with a comparable number of parameters.
Abstract: Investigates the problem of inserting an additional hidden variable into a standard HMM. It is shown that this can be done by introducing a continuous feature which is used to calculate the probability of observing the different states of the hidden variable. The posteriors are modelled by softmax functions with polynomial exponents and an efficient method is developed for reestimating their parameters. After analysing a two dimensional reestimation example on artificial data, the proposed HMM is evaluated on the 1997 Broadcast News task with a particular focus on spontaneous speech. To derive a good indicator variable for this purpose, classification experiments are carried out on fast and slow classes of phones on the 1997 Broadcast News training data. Finally, recognition experiments on the test set of this task show that the proposed model gives an improvement over a standard HMM with a comparable number of parameters.

1 citations


Proceedings ArticleDOI
15 Jul 2001
TL;DR: An efficient combining methodology that takes into account the recent performance of each model in an "optimal" way when calculating combiner weightings is developed at CL NewQuant Limited, UK.
Abstract: We describe a method of combining multiple market predictions using a "temperature dependent" combiner (developed at CL NewQuant Limited, UK). CL NewQuant uses a number of techniques to model markets, each of which gives their own predictions. The goal is to develop an efficient combining methodology that takes into account the recent performance of each model in an "optimal" way when calculating combiner weightings.

Proceedings ArticleDOI
04 Dec 2001
TL;DR: In this paper, the authors deal with a stabilization problem for a type of blending control system, which originated from studies of brain motor control, and establish exponential stability of their system based on theory of slowly time-varying systems.
Abstract: In this paper, we deal with a stabilization problem for a type of blending control system, which originated from studies of brain motor control. In our approach, the feedback controller has a module structure with softmax blending. The controller is composed of multiple modules, each of which consists of a state predictor and a controller, and the weights of blending are given by the softmax function of the prediction error from each module. We establish exponential stability of our system based on theory of slowly time-varying systems.