ChatPaper.aiChatPaper

梦幻奖励:基于人类偏好的文本生成三维图像

DreamReward: Text-to-3D Generation with Human Preference

March 21, 2024
作者: Junliang Ye, Fangfu Liu, Qixiu Li, Zhengyi Wang, Yikai Wang, Xinzhou Wang, Yueqi Duan, Jun Zhu
cs.AI

摘要

最近,从文本提示中创建3D内容取得了显著的成功。然而,当前的文本到3D方法通常生成的3D结果与人类偏好不太一致。本文提出了一个全面的框架,名为DreamReward,用于从人类偏好反馈中学习和改进文本到3D模型。首先,我们收集了25k个专家比较,基于系统化的注释流程,包括评分和排名。然后,我们构建了Reward3D——第一个通用的文本到3D人类偏好奖励模型,有效地编码人类偏好。基于3D奖励模型,最终我们进行了理论分析,并提出了Reward3D反馈学习(DreamFL),这是一种直接调整算法,用于优化具有重新定义评分器的多视角扩散模型。通过理论证明和广泛的实验比较,我们的DreamReward成功生成了高保真度和3D一致性结果,并显著提高了与人类意图对齐的提示。我们的结果展示了从人类反馈中学习以改进文本到3D模型的巨大潜力。
English
3D content creation from text prompts has shown remarkable success recently. However, current text-to-3D methods often generate 3D results that do not align well with human preferences. In this paper, we present a comprehensive framework, coined DreamReward, to learn and improve text-to-3D models from human preference feedback. To begin with, we collect 25k expert comparisons based on a systematic annotation pipeline including rating and ranking. Then, we build Reward3D -- the first general-purpose text-to-3D human preference reward model to effectively encode human preferences. Building upon the 3D reward model, we finally perform theoretical analysis and present the Reward3D Feedback Learning (DreamFL), a direct tuning algorithm to optimize the multi-view diffusion models with a redefined scorer. Grounded by theoretical proof and extensive experiment comparisons, our DreamReward successfully generates high-fidelity and 3D consistent results with significant boosts in prompt alignment with human intention. Our results demonstrate the great potential for learning from human feedback to improve text-to-3D models.

Summary

AI-Generated Summary

PDF382December 15, 2024