Understanding generalization in kernel ridgeless regression

Mar 26, 3pm, GHC 8102

Speaker: Pratik Patil

Abstract: Recall that we've been exploring the phenomenon of good out-of-sample error for estimators that achieve (nearly) zero in-sample error (even with noisy data); Ryan considered the minimum-norm interpolator in high-dimensional linear regression setting, while Veeru considered local smoothing methods with singular kernels and simplicial interpolation. This week, we'll consider minimum-norm interpolators in reproducing kernel Hilbert spaces. Empirical results in [Belkin et al., 2018] that indicate parallels in terms of good generalization in spite of overfitting between shallow architectures of kernel methods and deep architectures provide motivation for such discussion. Following [Liang and Rakhlin, 2018], we'll see how a combination of geometric properties of data and the kernel function, along with high dimensionality, lead to implicit regularization. We'll conclude with a converse from [Rakhlin and Zhai, 2018] that show inconsistency for the minimum-norm estimator using the Laplace kernel for any choice of bandwidth, even when selected using the data, if the input dimensional is constant, indicating the necessity of high dimensionality for good generalization in certain cases.