Self-Play Advantageous-Tuning of Diffusion Fashions for Textual content-to-Picture Technology


arXiv:2402.10210v1 Announce Sort: cross
Summary: Advantageous-tuning Diffusion Fashions stays an underexplored frontier in generative synthetic intelligence (GenAI), particularly when put next with the outstanding progress made in fine-tuning Giant Language Fashions (LLMs). Whereas cutting-edge diffusion fashions reminiscent of Secure Diffusion (SD) and SDXL depend on supervised fine-tuning, their efficiency inevitably plateaus after seeing a sure quantity of knowledge. Not too long ago, reinforcement studying (RL) has been employed to fine-tune diffusion fashions with human desire information, but it surely requires a minimum of two photographs (“winner” and “loser” photographs) for every textual content immediate. On this paper, we introduce an progressive approach referred to as self-play fine-tuning for diffusion fashions (SPIN-Diffusion), the place the diffusion mannequin engages in competitors with its earlier variations, facilitating an iterative self-improvement course of. Our strategy affords an alternative choice to standard supervised fine-tuning and RL methods, considerably enhancing each mannequin efficiency and alignment. Our experiments on the Choose-a-Pic dataset reveal that SPIN-Diffusion outperforms the present supervised fine-tuning technique in elements of human desire alignment and visible attraction proper from its first iteration. By the second iteration, it exceeds the efficiency of RLHF-based strategies throughout all metrics, attaining these outcomes with much less information.

Supply hyperlink


Please enter your comment!
Please enter your name here