ChatPaper.aiChatPaper

Carve3D:透過強化學習微調改進擴散模型的多視角重建一致性

Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

December 21, 2023
作者: Desai Xie, Jiahao Li, Hao Tan, Xin Sun, Zhixin Shu, Yi Zhou, Sai Bi, Sören Pirk, Arie E. Kaufman
cs.AI

摘要

最近在文本轉3D任務中的新進展利用微調的文本到圖像擴散模型生成多視圖圖像,然後進行 NeRF 重建。然而,現有的監督式微調(SFT)擴散模型仍然存在多視圖不一致性和由此產生的 NeRF 異常。雖然使用 SFT 進行更長時間的訓練可以改善一致性,但也會導致分佈轉移,降低多樣性和逼真細節。我們認為多視圖擴散模型的 SFT 類似於 LLM 對齊流程中的指導微調階段,可以從 RL 微調(RLFT)方法中受益。基本上,RLFT 方法通過使用其自身的輸出來優化模型,超越其 SFT 數據分佈,有效地緩解分佈轉移。為此,我們引入 Carve3D,一種與多視圖重建一致性(MRC)度量相結合的 RLFT 方法,以改善多視圖擴散模型的一致性。為了計算一組多視圖圖像的 MRC,我們將它們與相應的在相同視角重建的 NeRF 渲染進行比較。我們通過在受控不一致性水平下進行的大量實驗來驗證 MRC 的穩健性。我們增強了基本的 RLFT 算法以穩定訓練過程,減少分佈轉移並確定擴展定律。通過定性和定量實驗以及用戶研究,我們展示了 Carve3D 改善的多視圖一致性,由此帶來的優越 NeRF 重建質量,以及與更長的 SFT 相比的最小分佈轉移。項目網頁:https://desaixie.github.io/carve-3d。
English
Recent advancements in the text-to-3D task leverage finetuned text-to-image diffusion models to generate multi-view images, followed by NeRF reconstruction. Yet, existing supervised finetuned (SFT) diffusion models still suffer from multi-view inconsistency and the resulting NeRF artifacts. Although training longer with SFT improves consistency, it also causes distribution shift, which reduces diversity and realistic details. We argue that the SFT of multi-view diffusion models resembles the instruction finetuning stage of the LLM alignment pipeline and can benefit from RL finetuning (RLFT) methods. Essentially, RLFT methods optimize models beyond their SFT data distribution by using their own outputs, effectively mitigating distribution shift. To this end, we introduce Carve3D, a RLFT method coupled with the Multi-view Reconstruction Consistency (MRC) metric, to improve the consistency of multi-view diffusion models. To compute MRC on a set of multi-view images, we compare them with their corresponding renderings of the reconstructed NeRF at the same viewpoints. We validate the robustness of MRC with extensive experiments conducted under controlled inconsistency levels. We enhance the base RLFT algorithm to stabilize the training process, reduce distribution shift, and identify scaling laws. Through qualitative and quantitative experiments, along with a user study, we demonstrate Carve3D's improved multi-view consistency, the resulting superior NeRF reconstruction quality, and minimal distribution shift compared to longer SFT. Project webpage: https://desaixie.github.io/carve-3d.
PDF151December 15, 2024