Speaker: Pratik Patil

Abstract: One of the themes we explored last semester was understanding generalization in overfitted models (i.e., models trained with zero regression loss), even for noisy data. We examined several flavors of this phenomenon for local nonparametric methods (nearest neighbor simplicial interpolation, singular kernel smoothing), high-dimensional linear regression (asymptotic analysis), kernel regression (polynomial and Laplace kernels) in various capacities. This week, we will discuss recent work by Barlett et al. that characterizes conditions (involving two special notions of effective rank of the data covariance operator) under which the minimum-norm interpolator has nearly optimal excess risk for Gaussian linear regression problems. The key punchline of the work is that overparametrization may help provide many unimportant (low-variance) directions in the parameter space to effectively hide the noise that gets injected in an overfitted solution enabling good generalization. We’ll conclude by briefly discussing its relevance to recent progress on understanding generalization and optimization in deep neural networks (that are well approximated by certain linear functions assuming small-step gradient descent training on sufficiently wide networks) and high sensitivity of such models to small (careful) perturbations.

Benign overfitting in linear regression