Ferenc Huszár 4/1/2021

Notes on the Origin of Implicit Regularization in SGD

Read Original

This article analyzes the concept of implicit regularization in Stochastic Gradient Descent (SGD), explaining how the optimization algorithm itself biases learning toward minima that generalize well. It discusses recent research moving beyond the neural tangent kernel framework to study SGD with finite learning rates and minibatches, providing a more practical view of why deep neural networks generalize effectively.

Notes on the Origin of Implicit Regularization in SGD

Comments

No comments yet

Be the first to share your thoughts!