ChatPaper.aiChatPaper

夢幻獎勵:具人類偏好的文本生成3D技術

DreamReward: Text-to-3D Generation with Human Preference

March 21, 2024
作者: Junliang Ye, Fangfu Liu, Qixiu Li, Zhengyi Wang, Yikai Wang, Xinzhou Wang, Yueqi Duan, Jun Zhu
cs.AI

摘要

最近,從文字提示中創建3D內容已經取得了顯著的成功。然而,目前的文字轉3D方法通常生成的3D結果與人類偏好不太一致。本文提出了一個全面的框架,名為DreamReward,用於從人類偏好反饋中學習和改進文字轉3D模型。首先,我們收集了25k個專家比較,基於系統化的注釋流程,包括評分和排名。然後,我們建立了Reward3D——第一個通用的文字轉3D人類偏好獎勵模型,有效地編碼人類偏好。基於3D獎勵模型,我們最終進行理論分析並提出了Reward3D反饋學習(DreamFL),這是一種直接調整算法,用於優化具有重新定義評分者的多視圖擴散模型。通過理論證明和廣泛的實驗比較,我們的DreamReward成功生成了高保真度和3D一致的結果,並在與人類意圖的提示對齊方面取得了顯著提升。我們的結果顯示了從人類反饋中學習以改進文字轉3D模型的巨大潛力。
English
3D content creation from text prompts has shown remarkable success recently. However, current text-to-3D methods often generate 3D results that do not align well with human preferences. In this paper, we present a comprehensive framework, coined DreamReward, to learn and improve text-to-3D models from human preference feedback. To begin with, we collect 25k expert comparisons based on a systematic annotation pipeline including rating and ranking. Then, we build Reward3D -- the first general-purpose text-to-3D human preference reward model to effectively encode human preferences. Building upon the 3D reward model, we finally perform theoretical analysis and present the Reward3D Feedback Learning (DreamFL), a direct tuning algorithm to optimize the multi-view diffusion models with a redefined scorer. Grounded by theoretical proof and extensive experiment comparisons, our DreamReward successfully generates high-fidelity and 3D consistent results with significant boosts in prompt alignment with human intention. Our results demonstrate the great potential for learning from human feedback to improve text-to-3D models.

Summary

AI-Generated Summary

PDF382December 15, 2024