SHAP-EDITOR：指导式潜在三维编辑秒速完成

摘要

我们提出了一种名为Shap-Editor的新型前馈3D编辑框架。先前关于编辑3D对象的研究主要集中在利用现成的2D图像编辑网络编辑单个对象。这是通过一种称为蒸馏的过程实现的，该过程将知识从2D网络转移到3D资产。蒸馏需要至少几十分钟才能获得令人满意的编辑结果，因此并不是很实用。相比之下，我们探讨了是否可以通过一个前馈网络直接进行3D编辑，避免测试时的优化。具体来说，我们假设通过首先将3D对象编码到适当的潜在空间中，可以大大简化编辑过程。我们通过构建在Shap-E的潜在空间基础上来验证这一假设。我们展示了在这个空间中进行直接3D编辑是可能且高效的，通过构建一个仅需要大约一秒钟完成每次编辑的前馈编辑器网络。我们的实验表明，Shap-Editor在不同提示下很好地推广到分布内和分布外的3D资产，表现出与为每个编辑实例执行测试时优化的方法相当的性能。

English

We propose a novel feed-forward 3D editing framework called Shap-Editor. Prior research on editing 3D objects primarily concentrated on editing individual objects by leveraging off-the-shelf 2D image editing networks. This is achieved via a process called distillation, which transfers knowledge from the 2D network to 3D assets. Distillation necessitates at least tens of minutes per asset to attain satisfactory editing results, and is thus not very practical. In contrast, we ask whether 3D editing can be carried out directly by a feed-forward network, eschewing test-time optimisation. In particular, we hypothesise that editing can be greatly simplified by first encoding 3D objects in a suitable latent space. We validate this hypothesis by building upon the latent space of Shap-E. We demonstrate that direct 3D editing in this space is possible and efficient by building a feed-forward editor network that only requires approximately one second per edit. Our experiments show that Shap-Editor generalises well to both in-distribution and out-of-distribution 3D assets with different prompts, exhibiting comparable performance with methods that carry out test-time optimisation for each edited instance.

SHAP-EDITOR：指导式潜在三维编辑秒速完成

SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds

摘要

Support