Coaching Worth Capabilities by way of Classification for Scalable Deep RL

0
22


[Submitted on 6 Mar 2024]

Obtain a PDF of the paper titled Cease Regressing: Coaching Worth Capabilities by way of Classification for Scalable Deep RL, by Jesse Farebrother and 11 different authors

Obtain PDF
HTML (experimental)

Summary:Worth features are a central part of deep reinforcement studying (RL). These features, parameterized by neural networks, are skilled utilizing a imply squared error regression goal to match bootstrapped goal values. Nonetheless, scaling value-based RL strategies that use regression to massive networks, corresponding to high-capacity Transformers, has confirmed difficult. This issue is in stark distinction to supervised studying: by leveraging a cross-entropy classification loss, supervised strategies have scaled reliably to huge networks. Observing this discrepancy, on this paper, we examine whether or not the scalability of deep RL will also be improved just by utilizing classification instead of regression for coaching worth features. We display that worth features skilled with categorical cross-entropy considerably improves efficiency and scalability in a wide range of domains. These embody: single-task RL on Atari 2600 video games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, taking part in Chess with out search, and a language-agent Wordle job with high-capacity Transformers, reaching state-of-the-art outcomes on these domains. Via cautious evaluation, we present that the advantages of categorical cross-entropy primarily stem from its capacity to mitigate points inherent to value-based RL, corresponding to noisy targets and non-stationarity. General, we argue {that a} easy shift to coaching worth features with categorical cross-entropy can yield substantial enhancements within the scalability of deep RL at little-to-no price.

Submission historical past

From: Jesse Farebrother [view email]
[v1]
Wed, 6 Mar 2024 18:55:47 UTC (2,577 KB)



Supply hyperlink

LEAVE A REPLY

Please enter your comment!
Please enter your name here