ChatPaper.aiChatPaper

Carve3D:利用强化学习微调改善扩散模型的多视角重建一致性

Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

December 21, 2023
作者: Desai Xie, Jiahao Li, Hao Tan, Xin Sun, Zhixin Shu, Yi Zhou, Sai Bi, Sören Pirk, Arie E. Kaufman
cs.AI

摘要

最近在文本到三维任务中的最新进展利用微调的文本到图像扩散模型生成多视角图像,然后进行 NeRF 重建。然而,现有的监督微调(SFT)扩散模型仍然存在多视角不一致性和由此产生的 NeRF 瑕疵。尽管使用 SFT 进行更长时间的训练可以提高一致性,但也会导致分布转移,从而降低多样性和逼真细节。我们认为多视角扩散模型的 SFT 类似于 LLM 对齐流程中的指导微调阶段,并且可以从 RL 微调(RLFT)方法中受益。基本上,RLFT 方法通过使用其自身的输出优化模型,超越其 SFT 数据分布,有效地减轻分布转移。为此,我们引入 Carve3D,这是一种与多视角重建一致性(MRC)度量结合的 RLFT 方法,以改善多视角扩散模型的一致性。为了计算一组多视角图像上的 MRC,我们将其与在相同视角处重建的 NeRF 的相应渲染进行比较。我们通过在受控不一致性水平下进行的大量实验验证了 MRC 的稳健性。我们改进了基本的 RLFT 算法以稳定训练过程,减少分布转移,并确定缩放规律。通过定性和定量实验以及用户研究,我们展示了 Carve3D 相较于更长的 SFT 具有改进的多视角一致性、更优质的 NeRF 重建质量以及较小的分布转移。项目网页:https://desaixie.github.io/carve-3d。
English
Recent advancements in the text-to-3D task leverage finetuned text-to-image diffusion models to generate multi-view images, followed by NeRF reconstruction. Yet, existing supervised finetuned (SFT) diffusion models still suffer from multi-view inconsistency and the resulting NeRF artifacts. Although training longer with SFT improves consistency, it also causes distribution shift, which reduces diversity and realistic details. We argue that the SFT of multi-view diffusion models resembles the instruction finetuning stage of the LLM alignment pipeline and can benefit from RL finetuning (RLFT) methods. Essentially, RLFT methods optimize models beyond their SFT data distribution by using their own outputs, effectively mitigating distribution shift. To this end, we introduce Carve3D, a RLFT method coupled with the Multi-view Reconstruction Consistency (MRC) metric, to improve the consistency of multi-view diffusion models. To compute MRC on a set of multi-view images, we compare them with their corresponding renderings of the reconstructed NeRF at the same viewpoints. We validate the robustness of MRC with extensive experiments conducted under controlled inconsistency levels. We enhance the base RLFT algorithm to stabilize the training process, reduce distribution shift, and identify scaling laws. Through qualitative and quantitative experiments, along with a user study, we demonstrate Carve3D's improved multi-view consistency, the resulting superior NeRF reconstruction quality, and minimal distribution shift compared to longer SFT. Project webpage: https://desaixie.github.io/carve-3d.
PDF151December 15, 2024