A New Perspective on Shampoo's Preconditioner

0
11



arXiv:2406.17748v1 Announce Sort: cross
Summary: Shampoo, a second-order optimization algorithm which makes use of a Kronecker product preconditioner, has just lately garnered rising consideration from the machine studying neighborhood. The preconditioner utilized by Shampoo might be seen both as an approximation of the Gauss–Newton part of the Hessian or the covariance matrix of the gradients maintained by Adagrad. We offer an express and novel connection between the $textit{optimum}$ Kronecker product approximation of those matrices and the approximation made by Shampoo. Our connection highlights a refined however widespread false impression about Shampoo’s approximation. Specifically, the $textit{sq.}$ of the approximation utilized by the Shampoo optimizer is equal to a single step of the facility iteration algorithm for computing the aforementioned optimum Kronecker product approximation. Throughout a wide range of datasets and architectures we empirically display that that is near the optimum Kronecker product approximation. Moreover, for the Hessian approximation viewpoint, we empirically examine the influence of assorted sensible tips to make Shampoo extra computationally environment friendly (equivalent to utilizing the batch gradient and the empirical Fisher) on the standard of Hessian approximation.



Supply hyperlink

LEAVE A REPLY

Please enter your comment!
Please enter your name here