Author
David I. Spivak
Bio: David I. Spivak is an academic researcher. The author has contributed to research in topics: Dirichlet distribution & Hebbian theory. The author has co-authored 3 publications.
Papers
More filters
•
TL;DR: In this article, the authors make the case that the analogy between deep neural networks and actual brains is structurally flawed, since the wires in neural networks are more like nerve cells, in that they are what cause information to flow.
Abstract: There is an analogy that is often made between deep neural networks and actual brains, suggested by the nomenclature itself: the "neurons" in deep neural networks should correspond to neurons (or nerve cells, to avoid confusion) in the brain. We claim, however, that this analogy doesn't even type check: it is structurally flawed. In agreement with the slightly glib summary of Hebbian learning as "cells that fire together wire together", this article makes the case that the analogy should be different. Since the "neurons" in deep neural networks are managing the changing weights, they are more akin to the synapses in the brain; instead, it is the wires in deep neural networks that are more like nerve cells, in that they are what cause the information to flow. An intuition that nerve cells seem like more than mere wires is exactly right, and is justified by a precise category-theoretic analogy which we will explore in this article. Throughout, we will continue to highlight the error in equating artificial neurons with nerve cells by leaving "neuron" in quotes or by calling them artificial neurons.
We will first explain how to view deep neural networks as nested dynamical systems with a very restricted sort of interaction pattern, and then explain a more general sort of interaction for dynamical systems that is useful throughout engineering, but which fails to adapt to changing circumstances. As mentioned, an analogy is then forced upon us by the mathematical formalism in which they are both embedded. We call the resulting encompassing generalization deeply interacting learning systems: they have complex interaction as in control theory, but adaptation to circumstances as in deep neural networks.
••
TL;DR: In this article, it was shown that for any Dirichlet polynomial d, the rectangle-area formula A(d)=L(d)W(d ) holds for any empirical distribution.
Abstract: A Dirichlet polynomial d in one variable y is a function of the form d(y)=anny+⋯+a22y+a11y+a00y for some n,a0,…,an∈N. We will show how to think of a Dirichlet polynomial as a set-theoretic bundle, and thus as an empirical distribution. We can then consider the Shannon entropy H(d) of the corresponding probability distribution, and we define its length (or, classically, its perplexity) by L(d)=2H(d). On the other hand, we will define a rig homomorphism h:Dir→Rect from the rig of Dirichlet polynomials to the so-called rectangle rig, whose underlying set is R⩾0×R⩾0 and whose additive structure involves the weighted geometric mean; we write h(d)=(A(d),W(d)), and call the two components area and width (respectively). The main result of this paper is the following: the rectangle-area formula A(d)=L(d)W(d) holds for any Dirichlet polynomial d. In other words, the entropy of an empirical distribution can be calculated entirely in terms of the homomorphism h applied to its corresponding Dirichlet polynomial. We also show that similar results hold for the cross entropy.
••
TL;DR: In this article, it was shown that for any Dirichlet polynomial, the entropy of an empirical distribution can be calculated entirely in terms of the homomorphism applied to its corresponding Dirichletsomial.
Abstract: A Dirichlet polynomial $d$ in one variable ${\mathcal{y}}$ is a function of the form $d({\mathcal{y}})=a_n n^{\mathcal{y}}+\cdots+a_22^{\mathcal{y}}+a_11^{\mathcal{y}}+a_00^{\mathcal{y}}$ for some $n,a_0,\ldots,a_n\in\mathbb{N}$. We will show how to think of a Dirichlet polynomial as a set-theoretic bundle, and thus as an empirical distribution. We can then consider the Shannon entropy $H(d)$ of the corresponding probability distribution, and we define its length (or, classically, its perplexity) by $L(d)=2^{H(d)}$. On the other hand, we will define a rig homomorphism $h\colon\mathsf{Dir}\to\mathsf{Rect}$ from the rig of Dirichlet polynomials to the so-called rectangle rig, whose underlying set is $\mathbb{R}_{\geq0}\times\mathbb{R}_{\geq0}$ and whose additive structure involves the weighted geometric mean; we write $h(d)=(A(d),W(d))$, and call the two components area and width (respectively).
The main result of this paper is the following: the rectangle-area formula $A(d)=L(d)W(d)$ holds for any Dirichlet polynomial $d$. In other words, the entropy of an empirical distribution can be calculated entirely in terms of the homomorphism $h$ applied to its corresponding Dirichlet polynomial. We also show that similar results hold for the cross entropy.