There is an analogy that is often made between deep neural networks and actual brains, suggested by the nomenclature itself: the "neurons" in deep neural networks should correspond to neurons (or nerve cells, to avoid confusion) in the brain. We claim, however, that this analogy doesn't even type check: it is structurally flawed. In agreement with the slightly glib summary of Hebbian learning as "cells that fire together wire together", this article makes the case that the analogy should be different. Since the "neurons" in deep neural networks are managing the changing weights, they are more akin to the synapses in the brain; instead, it is the wires in deep neural networks that are more like nerve cells, in that they are what cause the information to flow. An intuition that nerve cells seem like more than mere wires is exactly right, and is justified by a precise category-theoretic analogy which we will explore in this article. Throughout, we will continue to highlight the error in equating artificial neurons with nerve cells by leaving "neuron" in quotes or by calling them artificial neurons. 
We will first explain how to view deep neural networks as nested dynamical systems with a very restricted sort of interaction pattern, and then explain a more general sort of interaction for dynamical systems that is useful throughout engineering, but which fails to adapt to changing circumstances. As mentioned, an analogy is then forced upon us by the mathematical formalism in which they are both embedded. We call the resulting encompassing generalization deeply interacting learning systems: they have complex interaction as in control theory, but adaptation to circumstances as in deep neural networks.

Deep neural networks as nested dynamical systems.

A Dirichlet polynomial d in one variable y is a function of the form d(y)=anny+⋯+a22y+a11y+a00y for some n,a0,…,an∈N. We will show how to think of a Dirichlet polynomial as a set-theoretic bundle, and thus as an empirical distribution. We can then consider the Shannon entropy H(d) of the corresponding probability distribution, and we define its length (or, classically, its perplexity) by L(d)=2H(d). On the other hand, we will define a rig homomorphism h:Dir→Rect from the rig of Dirichlet polynomials to the so-called rectangle rig, whose underlying set is R⩾0×R⩾0 and whose additive structure involves the weighted geometric mean; we write h(d)=(A(d),W(d)), and call the two components area and width (respectively). The main result of this paper is the following: the rectangle-area formula A(d)=L(d)W(d) holds for any Dirichlet polynomial d. In other words, the entropy of an empirical distribution can be calculated entirely in terms of the homomorphism h applied to its corresponding Dirichlet polynomial. We also show that similar results hold for the cross entropy.

Dirichlet Polynomials and Entropy

A Dirichlet polynomial $d$ in one variable ${\mathcal{y}}$ is a function of the form $d({\mathcal{y}})=a_n n^{\mathcal{y}}+\cdots+a_22^{\mathcal{y}}+a_11^{\mathcal{y}}+a_00^{\mathcal{y}}$ for some $n,a_0,\ldots,a_n\in\mathbb{N}$. We will show how to think of a Dirichlet polynomial as a set-theoretic bundle, and thus as an empirical distribution. We can then consider the Shannon entropy $H(d)$ of the corresponding probability distribution, and we define its length (or, classically, its perplexity) by $L(d)=2^{H(d)}$. On the other hand, we will define a rig homomorphism $h\colon\mathsf{Dir}\to\mathsf{Rect}$ from the rig of Dirichlet polynomials to the so-called rectangle rig, whose underlying set is $\mathbb{R}_{\geq0}\times\mathbb{R}_{\geq0}$ and whose additive structure involves the weighted geometric mean; we write $h(d)=(A(d),W(d))$, and call the two components area and width (respectively). 
The main result of this paper is the following: the rectangle-area formula $A(d)=L(d)W(d)$ holds for any Dirichlet polynomial $d$. In other words, the entropy of an empirical distribution can be calculated entirely in terms of the homomorphism $h$ applied to its corresponding Dirichlet polynomial. We also show that similar results hold for the cross entropy.

/pdf/dirichlet-polynomials-and-entropy-2ywpzlaoyk.pdf

David I. Spivak

Papers

Deep neural networks as nested dynamical systems.

Dirichlet Polynomials and Entropy

Dirichlet polynomials and entropy