Statistics and Machine Learning Working Group: People

Speaker: Marina Dubova

Abstract: We apply computational methods to study epistemic success of the data collection strategies that have been proposed by philosophers of science or executed by scientists themselves. We develop a multi-agent model of the scientific process that jointly formalizes its core aspects: data collection, data explanation, and social learning. We find that agents who choose new experiments at random develop the most accurate accounts of the world. On the other hand, the agents following the confirmation, falsification, crucial experimentation (theoretical disagreement), or novelty-motivated strategies end up with an illusion of epistemic success: they develop promising accounts for the data they collected, while completely misrepresenting the ground truth that they intended to learn about. These results, while being methodologically surprising, reflect basic principles of statistical learning and adaptive sampling.

Against theory-motivated data collection in sciences