Distributions of approximately polynomial functions of high-dimensional data

23 Feb, 2024, 3:00-4:30 pm, GHC 8102

Speaker: Kevin Han Huang

Abstract: Consider the problem of approximating the distribution of a polynomial of $n$ independent data points in $\R^d$, where $d$ may grow as an arbitrary function of $n$. The first part of the talk focuses on a COLT’23 paper (https://arxiv.org/abs/2302.05686), where we consider the special case of degree-two U-statistics in the context of kernel-based distribution tests. We prove that the limiting distribution of the U-statistics undergoes a phase transition from the non-degenerate Gaussian limit to the degenerate limit, regardless of its degeneracy and depending only on a moment ratio. A surprising consequence is that a non-degenerate U-statistic in high dimensions can have a non-Gaussian limit with a larger variance and asymmetric distribution. In a simple empirical setting, our results correctly predict how the test power of MMD and KSD scales with dimension d and kernel bandwidth. In the second part of the talk, I will briefly discuss results in an unpublished work, where we generalise such findings to degree-$m$ polynomials with $m=o(log n)$. We prove a general Gaussian universality result for such polynomials by Lindeberg’s principle, which extends the invariance principle of Mossel et al. (AoM, 2010). We also prove that the bounds obtained are near-optimal.