[2211.12612] Switch Studying for Contextual Multi-armed Bandits


Obtain a PDF of the paper titled Switch Studying for Contextual Multi-armed Bandits, by Changxiao Cai and a pair of different authors

Obtain PDF

Summary:Motivated by a spread of purposes, we examine on this paper the issue of switch studying for nonparametric contextual multi-armed bandits below the covariate shift mannequin, the place we have now information collected on supply bandits earlier than the beginning of the goal bandit studying. The minimax fee of convergence for the cumulative remorse is established and a novel switch studying algorithm that attains the minimax remorse is proposed. The outcomes quantify the contribution of the info from the supply domains for studying within the goal area within the context of nonparametric contextual multi-armed bandits.

In view of the overall impossibility of adaptation to unknown smoothness, we develop a data-driven algorithm that achieves near-optimal statistical ensures (as much as a logarithmic issue) whereas robotically adapting to the unknown parameters over a big assortment of parameter areas below an extra self-similarity assumption. A simulation examine is carried out for example the advantages of using the info from the auxiliary supply domains for studying within the goal area.

Submission historical past

From: Changxiao Cai [view email]
Tue, 22 Nov 2022 22:24:28 UTC (1,192 KB)
Thu, 25 Jan 2024 02:31:43 UTC (1,022 KB)

Supply hyperlink


Please enter your comment!
Please enter your name here