Speaker: Lucas Mentch

Abstract: Random forests remain among the most popular off-the-shelf supervised machine learning tools with a well-established track record of predictive accuracy in both regression and classification settings. Despite their empirical success as well as a bevy of recent work investigating their important statistical properties, a full and satisfying explanation for their success has yet to be put forth. In this talk, I’ll go through the most popular explanations offered to date and discuss what I feel are the shortcomings of each. I’ll then offer a new (old?) explanation for their success building on fundamental ideas in model selection and show that the intuition offered by this alternative explanation holds with remarkable consistency on both real and synthetic data. Specifically, I’ll offer a precise explanation for the actual role being played by the additional randomness and we’ll see that this idea, when injected into existing model selection procedures, can lead to surprisingly large improvements in performance.

An explanation for random forest success