Top 2 papers published in the topic of Softmax function in 1990

Book Chapter•DOI•

Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition

[...]

John S. Bridle

01 Jan 1990

TL;DR: In this article, the outputs of the network are treated as probabilities of alternatives (e.g. pattern classes), conditioned on the inputs, and two modifications are proposed: probability scoring, which is an alternative to squared error minimisation, and a normalised exponential (softmax) multi-input generalisation of the logistic nonlinearity.

...read moreread less

Abstract: We are concerned with feed-forward non-linear networks (multi-layer perceptrons, or MLPs) with multiple outputs. We wish to treat the outputs of the network as probabilities of alternatives (e.g. pattern classes), conditioned on the inputs. We look for appropriate output non-linearities and for appropriate criteria for adaptation of the parameters of the network (e.g. weights). We explain two modifications: probability scoring, which is an alternative to squared error minimisation, and a normalised exponential (softmax) multi-input generalisation of the logistic non-linearity. The two modifications together result in quite simple arithmetic, and hardware implementation is not difficult either. The use of radial units (squared distance instead of dot product) immediately before the softmax output stage produces a network which computes posterior distributions over class labels based on an assumption of Gaussian within-class distributions. However the training, which uses cross-class information, can result in better performance at class discrimination than the usual within-class training method, unless the within-class distribution assumptions are actually correct.

...read moreread less

1,130 citations

Proceedings Article•

Transforming Neural-Net Output Levels to Probability Distributions

[...]

John S. Denker¹, Yann LeCun¹•Institutions (1)

Bell Labs¹

01 Oct 1990

TL;DR: A method for computing the first two moments of the probability distribution indicating the range of outputs that are consistent with the input and the training data is presented and shed new light on and generalize the well-known "softmax" scheme.

...read moreread less

Abstract: (1) The outputs of a typical multi-output classification network do not satisfy the axioms of probability; probabilities should be positive and sum to one. This problem can be solved by treating the trained network as a preprocessor that produces a feature vector that can be further processed, for instance by classical statistical estimation techniques. (2) We present a method for computing the first two moments of the probability distribution indicating the range of outputs that are consistent with the input and the training data. It is particularly useful to combine these two ideas: we implement the ideas of section 1 using Parzen windows, where the shape and relative size of each window is computed using the ideas of section 2. This allows us to make contact between important theoretical ideas (e.g. the ensemble formalism) and practical techniques (e.g. back-prop). Our results also shed new light on and generalize the well-known "softmax" scheme.

...read moreread less

246 citations

Showing papers on "Softmax function published in 1990"