Open AccessPosted Content
Categorical Foundations of Gradient-Based Learning
Reads0
Chats0
TLDR
In this article, a categorical semantics of gradient-based machine learning algorithms in terms of lenses, parametrised maps, and reverse derivative categories is proposed, which encompasses a variety of gradient descent algorithms such as ADAM, AdaGrad, and Nesterov momentum, shedding new light on their similarities and differences.Abstract:
We propose a categorical semantics of gradient-based machine learning algorithms in terms of lenses, parametrised maps, and reverse derivative categories. This foundation provides a powerful explanatory and unifying framework: it encompasses a variety of gradient descent algorithms such as ADAM, AdaGrad, and Nesterov momentum, as well as a variety of loss functions such as as MSE and Softmax cross-entropy, shedding new light on their similarities and differences. Our approach to gradient-based learning has examples generalising beyond the familiar continuous domains (modelled in categories of smooth maps) and can be realized in the discrete setting of boolean circuits. Finally, we demonstrate the practical significance of our framework with an implementation in Python.read more
Citations
More filters
Journal ArticleDOI
Diagrammatic Differentiation for Quantum Machine Learning
TL;DR: In this article, the authors introduce diagrammatic differentiation for tensor calculus by generalising the dual number construction from rigs to monoidal categories, and apply this to ZX diagrams, showing how to calculate diagrammatically the gradient of a linear map with respect to a phase parameter.
Posted Content
Quantum Information Effects
Chris Heunen,Robin Kaarsgaard +1 more
TL;DR: This work studies the two dual quantum information effects to manipulate the amount of information in quantum computation: hiding and allocation, and provides universal categorical constructions that semantically interpret this arrow metalanguage with choice.
Posted Content
Categorical composable cryptography.
Anne Broadbent,Martti Karvonen +1 more
TL;DR: In this article, the authors formalize the simulation paradigm of cryptography in terms of category theory and show that protocols secure against abstract attacks form a symmetric monoidal category, thus giving an abstract model of composable security definitions in cryptography.
Posted Content
Category Theory in Machine Learning.
TL;DR: In this paper, the authors document the motivations, goals and common themes across these applications and touch on gradient-based learning, probability, and equivariant learning, as well as applying category theory to machine learning.
References
More filters
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Journal ArticleDOI
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner +6 more
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Journal ArticleDOI
Generative Adversarial Nets
Ian Goodfellow,Jean Pouget-Abadie,Mehdi Mirza,Bing Xu,David Warde-Farley,Sherjil Ozair,Aaron Courville,Yoshua Bengio +7 more
TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Journal ArticleDOI
Learning representations by back-propagating errors
TL;DR: Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain.
Journal Article
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
TL;DR: This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.