Meta-learning the mirror map in coverage mirror descent

0
31



Coverage Mirror Descent (PMD) is a well-liked framework in reinforcement studying, serving as a unifying perspective that encompasses quite a few algorithms. These algorithms are derived by way of the number of a mirror map and luxuriate in finite-time convergence ensures. Regardless of its recognition, the exploration of PMD’s full potential is restricted, with the vast majority of analysis specializing in a specific mirror map — particularly, the unfavorable entropy — which provides rise to the famend Pure Coverage Gradient (NPG) technique. It stays unsure from present theoretical research whether or not the selection of mirror map considerably influences PMD’s efficacy. In our work, we conduct empirical investigations to indicate that the traditional mirror map alternative (NPG) usually yields less-than-optimal outcomes throughout a number of commonplace benchmark environments. By making use of a meta-learning method, we determine extra environment friendly mirror maps that improve efficiency, each on common and by way of greatest efficiency achieved alongside the coaching trajectory. We analyze the traits of those discovered mirror maps and reveal shared traits amongst sure settings. Our outcomes recommend that mirror maps have the potential to be adaptable throughout numerous environments, elevating questions on the right way to greatest match a mirror map to an setting’s construction and traits.



Supply hyperlink

LEAVE A REPLY

Please enter your comment!
Please enter your name here