Safe testing

Nov 12, 3pm, GHC 8102

Speaker: Boyan Duan

Abstract: There is growing concern in various medical and life sciences that many of the published results are irreproducible in the sense that the promised error control guarantees for such results might be violated. To alleviate such reproducibility crisis, a class methods called "safe testing" has been proposed recently to control Type-1 error for a (potentially infinite) number of tests. Such methods construct a (super)martingale to combine evidence from each test, but as opposed to measuring the evidence via a p-value as is usually done, safe tests use an "S-value". Unlike the p-value, which is designed to control Type-1 error for a single test, the S-value is designed to control Type-1 error for an (infinite) sequence of tests. As with p-values, there can be various designs of S-values for a given test, however for the parametric tests, there is an "optimal" S-value, which takes the form of a Bayes factor with non-standard priors. Bayes factors are known to control Type-1 error for an online experiment, but the error guarantee is only proved for a simple null whereas the S-value works for composite nulls. As an example, I will present the S-value for the t-test, which has better performance in terms of statistical power compared with Jeffreys' Bayesian t-test. If time permits, I will also discuss a special case where the S-value itself is a (super)martingale which allows for Type-1 error control without combining the S-values.

Reference: https://arxiv.org/abs/1906.07801