Speaker: Ojash Neopane
Abstract: Off-Policy Evaluation of a Linear Functional is a fundamental problem in both causal inference and reinforcement learning, encompassing key challenges such as average treatment effect estimation and off-policy evaluation. Prior work has explored the asymptotic difficulty of this problem by deriving the semiparametric efficiency bound and showing that the Augmented Inverse Propensity Weighted (AIPW) estimator achieves it. More recent research has extended these ideas to the nonasymptotic setting with adaptive sampling, leading to locally minimax-optimal procedures for data collected adaptively. In this talk, I will present a comprehensive overview of these results, beginning with the semiparametric efficiency framework and then demonstrating how these insights can be leveraged to develop algorithms with strong finite-sample guarantees in adaptive settings. The talk will be based on the following papers: https://www.jstor.org/stable/2998560 https://arxiv.org/abs/2209.13075 https://arxiv.org/abs/2411.12786