Continuum approximations for wide neural networks and gradient descent

Nov 05, 3pm, GHC 8102

Speaker: Andrew Warren

Abstract: A recent avenue of work in the study of highly overparametrized neural networks is to investigate the limit where the number of hidden nodes goes to infinity. In the limit where the number of layers is fixed, but the width is sent to infinity, training a neural net using (stochastic) gradient descent approximates a continuum problem corresponding to the gradient flow of a special functional on the space of probability distributions. This type of problem is an object of intensive study in the optimal transport community. In this talk, we will give an overview of some recent results studying the convergence behavior and loss landscape of wide neural nets from a continuum limit perspective. In the second half of the talk, we will see that a nonlocal relaxation of the continuum problem suggests a novel regularization procedure for neural nets, somewhat like dropout, that significantly accelerates training dynamics.

References: https://www.pnas.org/content/115/33/E7665; https://arxiv.org/pdf/1805.09545.pdf; https://arxiv.org/pdf/1902.01843.pdf