Co-Training and its relationship to Missing Data

18 Apr, 2025, 4-5 pm, GHC 8102

Speaker: Rebecca Farina

Abstract: Co-training is a semi-supervised learning technique that leverages two distinct views (i.e., sets of features) of the data to iteratively expand a labeled set using unlabeled examples. In this talk, I will introduce the classical co-training framework and algorithm based on Combining Labeled and Unlabeled Data with Co-Training. I will then reformulate the co-training setting as a missing data problem, characterized by non-monotone missingness. This perspective allows us to connect with the rich literature on missing data mechanisms and offers tools to potentially improve the co-training estimator.