[2310.07891] A Principle of Non-Linear Characteristic Studying with One Gradient Step in Two-Layer Neural Networks

0
23


Obtain a PDF of the paper titled A Principle of Non-Linear Characteristic Studying with One Gradient Step in Two-Layer Neural Networks, by Behrad Moniri and three different authors

Obtain PDF

Summary:Characteristic studying is considered one of many elementary causes for the success of deep neural networks. It’s rigorously identified that in two-layer fully-connected neural networks below sure situations, one step of gradient descent on the primary layer adopted by ridge regression on the second layer can result in characteristic studying; characterised by the looks of a separated rank-one part — spike — within the spectrum of the characteristic matrix. Nevertheless, with a continuing gradient descent step measurement, this spike solely carries data from the linear part of the goal perform and due to this fact studying non-linear parts is unattainable. We present that with a studying price that grows with the pattern measurement, such coaching in truth introduces a number of rank-one parts, every comparable to a particular polynomial characteristic. We additional show that the limiting large-dimensional and enormous pattern coaching and check errors of the up to date neural networks are totally characterised by these spikes. By exactly analyzing the advance within the coaching and check errors, we display that these non-linear options can improve studying.

Submission historical past

From: Behrad Moniri [view email]
[v1]
Wed, 11 Oct 2023 20:55:02 UTC (1,228 KB)
[v2]
Sat, 3 Feb 2024 21:18:10 UTC (1,241 KB)



Supply hyperlink

LEAVE A REPLY

Please enter your comment!
Please enter your name here