Speaker: Aaditya Ramdas
Consider the problem of nonparametric two-sample mean testing, where we have access to i.i.d. samples from two multivariate distributions and wish to test whether they have the same mean. We propose a sequential test for this problem suitable for data-rich, memory-constrained situations. It is novel in several ways: it takes linear time and constant space to compute on the fly, and has robust high-dimensional statistical performance, including basically the same power guarantee (for a given false positive rate) as a batch/offline version of the test with the same computational constraints. Most notably, it has a distinct computational advantage over the batch test, because it accesses only as many samples as are required -- its stopping time is adaptive to the unknown difficulty of the problem! We analyze the test and prove these properties in a rigorously finite-sample fashion, using a novel uniform empirical Bernstein version of the law of the iterated logarithm (LIL), which may be of independent interest and allows analysis of sequential tests in a general framework. We demonstrate how to extend our ideas to nonparametric homogeneity and independence testing, and make a case for their even broader applicability.
We thank Google Pittsburgh for their gracious support