scispace - formally typeset
Search or ask a question

Showing papers on "Softmax function published in 2004"


Journal ArticleDOI
Ian T. Nabney1
TL;DR: This work shows how RBFs with logistic and softmax outputs can be trained efficiently using the Fisher scoring algorithm, and compares this approach with standard non-linear optimisation algorithms on a number of datasets.
Abstract: Radial Basis Function networks with linear outputs are often used in regression problems because they can be substantially faster to train than Multi-layer Perceptrons. For classification problems, the use of linear outputs is less appropriate as the outputs are not guaranteed to represent probabilities. We show how RBFs with logistic and softmax outputs can be trained efficiently using the Fisher scoring algorithm. This approach can be used with any model which consists of a generalised linear output function applied to a model which is linear in its parameters. We compare this approach with standard non-linear optimisation algorithms on a number of datasets.

58 citations


Proceedings ArticleDOI
25 Jul 2004
TL;DR: Softprop is a novel learning approach presented here that is reminiscent of the softmax explore-exploit Q-learning search heuristic and fits the problem while delaying settling into error minima to achieve better generalization and more robust learning.
Abstract: Multi-layer backpropagation, like many learning algorithms that can create complex decision surfaces, is prone to overfitting. Softprop is a novel learning approach presented here that is reminiscent of the softmax explore-exploit Q-learning search heuristic. It fits the problem while delaying settling into error minima to achieve better generalization and more robust learning. This is accomplished by blending standard SSE optimization with lazy training, a new objective function well suited to learning classification tasks, to form a more stable learning model. Over several machine learning data sets, softprop reduces classification error by 17.1 percent and the variance in results by 38.6 percent over standard SSE minimization.

13 citations


Book ChapterDOI
04 Dec 2004
TL;DR: It is demonstrated that the AE-GSBFN is capable of providing better performance than the existing method and overcomes the curse of dimensionality and avoids a fall into local minima through the allocation and elimination processes.
Abstract: In this paper, we propose a dynamic allocation method of basis functions, an Allocation/Elimination Gaussian Softmax Basis Function Network (AE-GSBFN), that is used in reinforcement learning AE-GSBFN is a kind of actor-critic method that uses basis functions This method can treat continuous high-dimensional state spaces, because basis functions required only for learning are dynamically allocated, and if an allocated basis function is identified as redundant, the function is eliminated This method overcomes the curse of dimensionality and avoids a fall into local minima through the allocation and elimination processes To confirm the effectiveness of our method, we used a maze task to compare our method with an existing method, which has only an allocation process Moreover, as learning of continuous high-dimensional state spaces, our method was applied to motion control of a humanoid robot We demonstrate that the AE-GSBFN is capable of providing better performance than the existing method.

12 citations


Journal ArticleDOI
TL;DR: This paper studies the application of Q-learning on Process Control problems, more precisely on Neutralization Processes, and results show that the controllers are able to learn how to control adequately the process.

11 citations


Proceedings Article
01 Jan 2004
TL;DR: The application shows the ability of the proposed Reinforcement to nonlinear process control to control chemical processes with difficult, unknown or time-varying dynamics.
Abstract: This paper presents the application of Reinforcement to nonlinear process control Reinforcement Learning is a model-free technique based on online learning without supervision, with the objective of optimizing a cumulative future reward by resorting to experimentation with the system The One-step-ahead Q-learning look-up table of reinforcement Learning Method is applied to a model of a pH neutralization process Control actions are selected using the e-greedy and softmax policies The application shows the ability of the proposed method to control chemical processes with difficult, unknown or time-varying dynamics

2 citations


01 Jan 2004
TL;DR: Experimental results show how the ability of the Boltzmann Softmax action selection function to differentiate between suboptimal actions can be made to work on an inter-robot level, as a mechanism forallocating high-performance robots to high-value tasks withoutcommunication.
Abstract: —We present an extension to our adaptive multi-robottask allocation algorithm based on vacancy chains, a resourcedistribution process common in animal and human societies. Thealgorithm uses individual reinforcement learning of task utilitiesand relies on the specializing abilities of the members of thegroup to promote dedicated optimal allocation patterns. Usingrealistic simulation experiments, we evaluate the approach bycomparing greedy and softmax action selection functions for taskallocation. We conclude that using softmax functions makes thevacancy chain algorithm responsive to different levels of abilityin a group of heterogeneous robots as well as to the effects of theunderlying group dynamics such as interference and synergy. I. I NTRODUCTION Existing multi-robot task allocation (MRTA) algorithms [1],[2], [3], [4] are typically not sensitive to the complex effectsof group dynamics, such as interference and synergy. For acooperative task such as transportation or foraging, the averagecompletion time may depend on the number of robots thatare allocated to the same task. Allocating a robot to a taskmay have either a positive or negative effect on a group’sperformance according to how much that robot contributespositively, in accomplishing tasks, or negatively, in increasinginterference. Such dynamics are difficult to model.As a way of circumventing the difficulties related to model-ing group dynamics, our past work [5] presented the vacancychain (VC) algorithm. This algorithm is inspired by the VCdistribution process as found in animal and human societies[6]. Each robot in a group following this algorithm uses localreinforcement learning (RL) to estimate the utilities of a set oftasks. From the local utilities and the robots’ action selectionfunctions, emerges the allocation pattern. Experiments in sim-ulation have shown that for groups of homogeneous robots,the VC algorithm promotes optimal system states as definedby the VC framework.The VC algorithm relies on stigmergy [7], unintentionalcommunication between the robots through their effects onthe environment, to produce specialized individuals for optimalallocation. In this paper we present experimental results show-ing how the ability of the Boltzmann Softmax action selectionfunction to differentiate between suboptimal actions can bemade to work on an inter-robot level, as a mechanism forallocating high-performance robots to high-value tasks withoutcommunication. The VC algorithm can thus be extended towork for groups of heterogeneous robots. Extending the VCalgorithm to cover groups of heterogeneous robots increasesits applicability. This is important because the VC algorithm,unlike existing MRTA algorithms, is sensitive to the effectsof group dynamics and provides a way of optimizing theperformance of groups of cooperative robots in domains wherethese effects are significant.II. M

1 citations


Posted Content
TL;DR: A method which assigns to each layer of a multilayer neural network, whose network dynamics is governed by a noisy winner-take-all mechanism, a neural temperature, and shows that after a transient the neural temperature decreases in each layer according to a power law indicates a self-organized annealing behavior induced by the learning rule itself.
Abstract: In this paper we present a method which assigns to each layer of a multilayer neural network, whose network dynamics is governed by a noisy winner-take-all mechanism, a neural temperature. This neural temperature is obtained by a least mean square fit of the probability distribution of the noisy winner-take-all mechanism to the distribution of a softmax mechanism, which has a well defined temperature as free parameter. We call this approximated temperature resulting from the optimization step the neural temperature. We apply this method to a multilayer neural network during learning the XOR-problem with a Hebb-like learning rule and show that after a transient the neural temperature decreases in each layer according to a power law. This indicates a self-organized annealing behavior induced by the learning rule itself instead of an external adjustment of a control parameter as in physically motivated optimization methods e.g. simulated annealing.

1 citations


Proceedings ArticleDOI
20 Sep 2004
TL;DR: Two algorithms for the optimization of time series forecasting by combination of models are proposed and evaluated and both are able to improve the forecasting of different time series, reducing the forecast error (RMSE) and increasing the modeling capability expressed by a reduction of the bias error (BE).
Abstract: Two algorithms for the optimization of time series forecasting by combination of models are proposed and evaluated. The first named GABoost, exploits the heuristic of genetic algorithm in order to search the optimal weights for the mixing of forecasting models. The second named CombFEC, extracts information provided by the forecast errors (RMSE, BE and MAE) of each model to be combined, and the correlation between each model and the forecasted time series, in order to build an error-correlation function (FEC) used to calculate the weights with a SOFTMAX function. The results show that both algorithms are able to improve the forecasting of different time series, reducing the forecast error (RMSE) and increasing the modeling capability expressed by a reduction of the bias error (BE).