scispace - formally typeset
Open AccessPosted Content

Categorical Foundations of Gradient-Based Learning

Reads0
Chats0
TLDR
In this article, a categorical semantics of gradient-based machine learning algorithms in terms of lenses, parametrised maps, and reverse derivative categories is proposed, which encompasses a variety of gradient descent algorithms such as ADAM, AdaGrad, and Nesterov momentum, shedding new light on their similarities and differences.
Abstract
We propose a categorical semantics of gradient-based machine learning algorithms in terms of lenses, parametrised maps, and reverse derivative categories. This foundation provides a powerful explanatory and unifying framework: it encompasses a variety of gradient descent algorithms such as ADAM, AdaGrad, and Nesterov momentum, as well as a variety of loss functions such as as MSE and Softmax cross-entropy, shedding new light on their similarities and differences. Our approach to gradient-based learning has examples generalising beyond the familiar continuous domains (modelled in categories of smooth maps) and can be realized in the discrete setting of boolean circuits. Finally, we demonstrate the practical significance of our framework with an implementation in Python.

read more

Citations
More filters
Journal ArticleDOI

Diagrammatic Differentiation for Quantum Machine Learning

TL;DR: In this article, the authors introduce diagrammatic differentiation for tensor calculus by generalising the dual number construction from rigs to monoidal categories, and apply this to ZX diagrams, showing how to calculate diagrammatically the gradient of a linear map with respect to a phase parameter.
Posted Content

Quantum Information Effects

TL;DR: This work studies the two dual quantum information effects to manipulate the amount of information in quantum computation: hiding and allocation, and provides universal categorical constructions that semantically interpret this arrow metalanguage with choice.
Posted Content

Categorical composable cryptography.

TL;DR: In this article, the authors formalize the simulation paradigm of cryptography in terms of category theory and show that protocols secure against abstract attacks form a symmetric monoidal category, thus giving an abstract model of composable security definitions in cryptography.
Posted Content

Category Theory in Machine Learning.

TL;DR: In this paper, the authors document the motivations, goals and common themes across these applications and touch on gradient-based learning, probability, and equivariant learning, as well as applying category theory to machine learning.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Journal ArticleDOI

Gradient-based learning applied to document recognition

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Journal ArticleDOI

Generative Adversarial Nets

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Journal ArticleDOI

Learning representations by back-propagating errors

TL;DR: Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain.
Journal Article

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

TL;DR: This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.
Related Papers (5)