[2402.08991] In direction of Sturdy Mannequin-Based mostly Reinforcement Studying Towards Adversarial Corruption


Obtain a PDF of the paper titled In direction of Sturdy Mannequin-Based mostly Reinforcement Studying Towards Adversarial Corruption, by Chenlu Ye and three different authors

Obtain PDF
HTML (experimental)

Summary:This examine tackles the challenges of adversarial corruption in model-based reinforcement studying (RL), the place the transition dynamics could be corrupted by an adversary. Current research on corruption-robust RL largely deal with the setting of model-free RL, the place sturdy least-square regression is commonly employed for worth perform estimation. Nonetheless, these methods can’t be straight utilized to model-based RL. On this paper, we deal with model-based RL and take the utmost chance estimation (MLE) method to study transition mannequin. Our work encompasses each on-line and offline settings. Within the on-line setting, we introduce an algorithm known as corruption-robust optimistic MLE (CR-OMLE), which leverages total-variation (TV)-based info ratios as uncertainty weights for MLE. We show that CR-OMLE achieves a remorse of $tilde{mathcal{O}}(sqrt{T} + C)$, the place $C$ denotes the cumulative corruption degree after $T$ episodes. We additionally show a decrease sure to indicate that the additive dependence on $C$ is perfect. We prolong our weighting method to the offline setting, and suggest an algorithm named corruption-robust pessimistic MLE (CR-PMLE). Underneath a uniform protection situation, CR-PMLE displays suboptimality worsened by $mathcal{O}(C/n)$, practically matching the decrease sure. To the most effective of our information, that is the primary work on corruption-robust model-based RL algorithms with provable ensures.

Submission historical past

From: Chenlu Ye [view email]
Wed, 14 Feb 2024 07:27:30 UTC (40 KB)
Thu, 15 Feb 2024 04:30:09 UTC (40 KB)

Supply hyperlink


Please enter your comment!
Please enter your name here