MVInpainter：学习多视图一致修复以连接2D和3D编辑

摘要

最近，新颖视角合成（NVS）和三维生成取得了显著进展。然而，这些工作主要集中在有限的类别或合成的三维资产上，这些资产不利于推广到具有挑战性的野外场景，并且无法直接用于二维合成。此外，这些方法严重依赖摄像机姿势，限制了它们在现实世界中的应用。为了克服这些问题，我们提出了MVInpainter，将三维编辑重新构建为多视角二维修补任务。具体而言，MVInpainter通过参考引导部分修补多视角图像，而不是难以从头开始生成完全新颖的视角，这在处理野外NVS的困难程度上大大简化了，并利用未掩盖的线索而不是显式的姿势条件。为了确保跨视图一致性，MVInpainter通过来自运动组件的视频先验和来自连接的参考键和值注意力的外观引导进行增强。此外，MVInpainter还结合了槽注意力，以从未掩盖区域聚合高级光流特征，以控制摄像机移动，并实现无姿势的训练和推断。在面向对象和前向数据集上进行了充分的场景级实验，验证了MVInpainter的有效性，包括多视角对象去除、合成、插入和替换等多样化任务。项目页面为https://ewrfcas.github.io/MVInpainter/。

English

Novel View Synthesis (NVS) and 3D generation have recently achieved prominent improvements. However, these works mainly focus on confined categories or synthetic 3D assets, which are discouraged from generalizing to challenging in-the-wild scenes and fail to be employed with 2D synthesis directly. Moreover, these methods heavily depended on camera poses, limiting their real-world applications. To overcome these issues, we propose MVInpainter, re-formulating the 3D editing as a multi-view 2D inpainting task. Specifically, MVInpainter partially inpaints multi-view images with the reference guidance rather than intractably generating an entirely novel view from scratch, which largely simplifies the difficulty of in-the-wild NVS and leverages unmasked clues instead of explicit pose conditions. To ensure cross-view consistency, MVInpainter is enhanced by video priors from motion components and appearance guidance from concatenated reference key&value attention. Furthermore, MVInpainter incorporates slot attention to aggregate high-level optical flow features from unmasked regions to control the camera movement with pose-free training and inference. Sufficient scene-level experiments on both object-centric and forward-facing datasets verify the effectiveness of MVInpainter, including diverse tasks, such as multi-view object removal, synthesis, insertion, and replacement. The project page is https://ewrfcas.github.io/MVInpainter/.

MVInpainter：学习多视图一致修复以连接2D和3D编辑

MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and 3D Editing

摘要

Support