Home ML/Data science blogs [2403.08160] Asymptotics of Random Characteristic Regression Past the Linear Scaling Regime

[2403.08160] Asymptotics of Random Characteristic Regression Past the Linear Scaling Regime

0
[2403.08160] Asymptotics of Random Characteristic Regression Past the Linear Scaling Regime

[ad_1]

[Submitted on 13 Mar 2024]

Obtain a PDF of the paper titled Asymptotics of Random Characteristic Regression Past the Linear Scaling Regime, by Hong Hu and a pair of different authors

Obtain PDF

Summary:Latest advances in machine studying have been achieved by utilizing overparametrized fashions skilled till close to interpolation of the coaching information. It was proven, e.g., by means of the double descent phenomenon, that the variety of parameters is a poor proxy for the mannequin complexity and generalization capabilities. This leaves open the query of understanding the impression of parametrization on the efficiency of those fashions. How does mannequin complexity and generalization rely upon the variety of parameters $p$? How ought to we select $p$ relative to the pattern dimension $n$ to attain optimum check error?

On this paper, we examine the instance of random function ridge regression (RFRR). This mannequin will be seen both as a finite-rank approximation to kernel ridge regression (KRR), or as a simplified mannequin for neural networks skilled within the so-called lazy regime. We contemplate covariates uniformly distributed on the $d$-dimensional sphere and compute sharp asymptotics for the RFRR check error within the high-dimensional polynomial scaling, the place $p,n,d to infty$ whereas $p/ d^{kappa_1}$ and $n / d^{kappa_2}$ keep fixed, for all $kappa_1 , kappa_2 in mathbb{R}_{>0}$. These asymptotics exactly characterize the impression of the variety of random options and regularization parameter on the check efficiency. Specifically, RFRR displays an intuitive trade-off between approximation and generalization energy. For $n = o(p)$, the pattern dimension $n$ is the bottleneck and RFRR achieves the identical efficiency as KRR (which is equal to taking $p = infty$). Alternatively, if $p = o(n)$, the variety of random options $p$ is the limiting issue and RFRR check error matches the approximation error of the random function mannequin class (akin to taking $n = infty$). Lastly, a double descent seems at $n= p$, a phenomenon that was beforehand solely characterised within the linear scaling $kappa_1 = kappa_2 = 1$.

Submission historical past

From: Theodor Misiakiewicz Mr. [view email]
[v1]
Wed, 13 Mar 2024 00:59:25 UTC (2,966 KB)

[ad_2]

Supply hyperlink

LEAVE A REPLY

Please enter your comment!
Please enter your name here