Tinker:扩散模型赋予3D的礼物——无需逐场景优化的稀疏输入多视角一致性编辑
Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization
August 20, 2025
作者: Canyu Zhao, Xiaoman Li, Tianjian Feng, Zhiyue Zhao, Hao Chen, Chunhua Shen
cs.AI
摘要
我们推出Tinker,一个多功能的高保真3D编辑框架,能够在无需针对每个场景进行微调的情况下,实现单次及少量样本的编辑。与以往技术不同,那些方法需要大量的场景优化来确保多视角一致性或生成数十个一致的编辑输入视图,而Tinker仅需一至两张图片即可提供稳健、多视角一致的编辑效果。这一能力源于对预训练扩散模型的重新利用,从而解锁了其潜在的3D感知能力。为推进该领域研究,我们构建了首个大规模多视角编辑数据集及数据处理流程,涵盖多样场景与风格。基于此数据集,我们开发了无需逐场景训练即可生成多视角一致编辑视图的框架,该框架包含两个创新组件:(1) 参考多视角编辑器:实现精确、参考驱动的编辑,确保所有视角下的连贯性。(2) 任意视角到视频合成器:利用视频扩散模型的空间-时间先验,即使从稀疏输入也能完成高质量的场景补全和新视角生成。通过大量实验,Tinker显著降低了通用3D内容创作的门槛,在编辑、新视角合成及渲染增强任务上达到了业界领先水平。我们相信,Tinker标志着迈向真正可扩展、零样本3D编辑的关键一步。项目网页:https://aim-uofa.github.io/Tinker
English
We introduce Tinker, a versatile framework for high-fidelity 3D editing that
operates in both one-shot and few-shot regimes without any per-scene
finetuning. Unlike prior techniques that demand extensive per-scene
optimization to ensure multi-view consistency or to produce dozens of
consistent edited input views, Tinker delivers robust, multi-view consistent
edits from as few as one or two images. This capability stems from repurposing
pretrained diffusion models, which unlocks their latent 3D awareness. To drive
research in this space, we curate the first large-scale multi-view editing
dataset and data pipeline, spanning diverse scenes and styles. Building on this
dataset, we develop our framework capable of generating multi-view consistent
edited views without per-scene training, which consists of two novel
components: (1) Referring multi-view editor: Enables precise, reference-driven
edits that remain coherent across all viewpoints. (2) Any-view-to-video
synthesizer: Leverages spatial-temporal priors from video diffusion to perform
high-quality scene completion and novel-view generation even from sparse
inputs. Through extensive experiments, Tinker significantly reduces the barrier
to generalizable 3D content creation, achieving state-of-the-art performance on
editing, novel-view synthesis, and rendering enhancement tasks. We believe that
Tinker represents a key step towards truly scalable, zero-shot 3D editing.
Project webpage: https://aim-uofa.github.io/Tinker