MVInpainter：學習多視角一致修補，以橋接2D和3D編輯

摘要

最近，新穎視角合成（NVS）和3D生成取得了顯著進展。然而，這些研究主要集中在有限的類別或合成的3D資產上，這些資產難以泛化至具有挑戰性的野外場景，也無法直接應用於2D合成。此外，這些方法嚴重依賴相機姿勢，限制了它們在現實世界中的應用。為了克服這些問題，我們提出了MVInpainter，將3D編輯重新定義為多視角2D修補任務。具體來說，MVInpainter通過參考引導部分修補多視角圖像，而不是從頭開始難以生成完全新穎的視角，這在處理野外NVS的困難程度上大大簡化了問題，並利用未遮罩的線索而非明確的姿勢條件。為確保跨視圖一致性，MVInpainter通過來自運動組件的視頻先驗和來自串聯參考關鍵&值注意力的外觀引導進行增強。此外，MVInpainter還結合了槽關注，從未遮罩區域聚合高級光流特徵，以控制相機運動，實現無需姿勢的訓練和推理。對於以物體為中心和面向前方的數據集進行了充分的場景級實驗，驗證了MVInpainter的有效性，包括多視角物體去除、合成、插入和替換等多樣任務。項目頁面為https://ewrfcas.github.io/MVInpainter/。

English

Novel View Synthesis (NVS) and 3D generation have recently achieved prominent improvements. However, these works mainly focus on confined categories or synthetic 3D assets, which are discouraged from generalizing to challenging in-the-wild scenes and fail to be employed with 2D synthesis directly. Moreover, these methods heavily depended on camera poses, limiting their real-world applications. To overcome these issues, we propose MVInpainter, re-formulating the 3D editing as a multi-view 2D inpainting task. Specifically, MVInpainter partially inpaints multi-view images with the reference guidance rather than intractably generating an entirely novel view from scratch, which largely simplifies the difficulty of in-the-wild NVS and leverages unmasked clues instead of explicit pose conditions. To ensure cross-view consistency, MVInpainter is enhanced by video priors from motion components and appearance guidance from concatenated reference key&value attention. Furthermore, MVInpainter incorporates slot attention to aggregate high-level optical flow features from unmasked regions to control the camera movement with pose-free training and inference. Sufficient scene-level experiments on both object-centric and forward-facing datasets verify the effectiveness of MVInpainter, including diverse tasks, such as multi-view object removal, synthesis, insertion, and replacement. The project page is https://ewrfcas.github.io/MVInpainter/.

MVInpainter：學習多視角一致修補，以橋接2D和3D編輯

MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and 3D Editing

摘要

Support