Conditional Independence Testing for High-dimensional Nonstationary Nonlinear Time Series

22 Oct, 2024, 3:30-4:30 pm, GHC 6501

Speaker: Michael Wieck-Sosa

Abstract: We introduce a framework for testing conditional independence relationships between time series that are possibly high-dimensional, nonstationary, and nonlinear. As far as we know, this is the first conditional independence test that can be used with a single realization of a nonstationary nonlinear time series (X, Y, Z). Consider the setting in which X and Y are univariate time series, and Z is a multivariate time series. Our test has two key steps. First, regress X on Z, and Y on Z using black-box time-varying nonlinear regression estimators. Notably, we allow the error processes to be temporally dependent and nonstationary. Second, estimate the time-varying variances of the process of error products based on the process of residual products. Using these estimates of the time-varying variances, we can approximate the desired quantile of our test statistic using a bootstrap procedure based on a distribution-uniform version of the strong Gaussian approximation for nonstationary nonlinear time series from Mies and Steland [MS22]. We extend this testing procedure to the high-dimensional setting, where the dimensions of the processes are allowed to grow with the sample size. We conduct a real data analysis with nonstationary nonlinear epidemiological time series. Specifically, we determine whether a population’s level of infectivity contains auxiliary information about future case growth rates of COVID-19. We discuss how to gain power with our ?2-type test statistic by using high-resolution data from groups of time series, such as sensor networks in neuroscience and physics, nearby cities in epidemiology and economics, or related price series in finance and e-commerce. Our testing framework opens the door to many downstream statistical applications which utilize conditional independence tests. In our companion paper [WHR24], we use our test as the basis of a causal discovery algorithm for nonstationary nonlinear time series, which we use to study epidemiological dynamics.