Accelerating Stochastic Gradient Descent

Aug 22, 12.01pm, Gates 8102

Speaker: Prateek Jain

Abstract: Momentum based Stochastic Gradient Descent (SGD) algorithms are the workhorse for deep learning, and central premise for adding momentum based techniques is to accelerate SGD. However, it is not clear if such methods really accelerate vanilla SGD (with batch-size=1) or require the batch-size to be large to provide acceleration. In this talk, we will present a surprising result which shows that such methods *cannot* improve over SGD in standard stochastic approximation setting. However, by modifying the algorithm, we can show that one can indeed accelerate SGD but this requires very careful parametrization of the problem. Based on joint works with Rahul Kidambi, Praneeth Netrapalli, Sham Kakade, and Aaron Sidford.