ML/Data science blogs

spikes within the coaching loss and their affect on generalization by characteristic studying

June 7, 2024

Table of Contents

[Submitted on 7 Jun 2023 (v1), last revised 6 Jun 2024 (this version, v3)]

View a PDF of the paper titled Catapults in SGD: spikes within the coaching loss and their affect on generalization by characteristic studying, by Libin Zhu and three different authors

View PDF

Summary:On this paper, we first current an evidence concerning the frequent prevalence of spikes within the coaching loss when neural networks are skilled with stochastic gradient descent (SGD). We offer proof that the spikes within the coaching lack of SGD are “catapults”, an optimization phenomenon initially noticed in GD with massive studying charges in [Lewkowycz et al. 2020]. We empirically present that these catapults happen in a low-dimensional subspace spanned by the highest eigenvectors of the tangent kernel, for each GD and SGD. Second, we posit an evidence for a way catapults result in higher generalization by demonstrating that catapults promote characteristic studying by rising alignment with the Common Gradient Outer Product (AGOP) of the true predictor. Moreover, we show {that a} smaller batch measurement in SGD induces a bigger variety of catapults, thereby bettering AGOP alignment and take a look at efficiency.

Submission historical past

From: Libin Zhu [view email]
[v1]
Wed, 7 Jun 2023 22:37:11 UTC (10,686 KB)
[v2]
Solar, 3 Mar 2024 00:28:01 UTC (31,098 KB)
[v3]
Thu, 6 Jun 2024 03:57:32 UTC (31,466 KB)

Supply hyperlink

Submission historical past

LEAVE A REPLY Cancel reply