GuideFlow3D:基于优化引导的整流流实现外观迁移
GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer
October 17, 2025
作者: Sayan Deb Sarkar, Sinisa Stekovic, Vincent Lepetit, Iro Armeni
cs.AI
摘要
利用外观对象的不同表现形式(如图像或文本)将外观迁移至3D资产,因其在游戏、增强现实及数字内容创作等行业的广泛应用而备受关注。然而,当输入与外观对象间的几何结构差异显著时,现有最先进的方法仍显不足。直接应用3D生成模型看似直观,但我们证明这最终难以产生令人满意的结果。相反,我们提出了一种受通用引导启发的原则性方法。给定一个基于图像或文本预训练的整流流模型,我们的免训练方法通过定期添加引导与采样过程互动。这种引导可建模为可微损失函数,我们尝试了两种不同类型的引导,包括针对外观的部分感知损失和自相似性损失。实验表明,我们的方法成功地将纹理和几何细节迁移至输入3D资产,在质量和数量上均超越基线方法。同时,我们指出传统评估指标因无法聚焦局部细节及在缺乏真实数据情况下比较不同输入,而不适用于此任务。因此,我们采用基于GPT的系统客观排序输出,以评估外观迁移质量,确保评估的稳健性和人性化,这一点在我们的用户研究中得到进一步证实。除展示场景外,我们的方法具有通用性,可扩展至不同类型的扩散模型和引导函数。
English
Transferring appearance to 3D assets using different representations of the
appearance object - such as images or text - has garnered interest due to its
wide range of applications in industries like gaming, augmented reality, and
digital content creation. However, state-of-the-art methods still fail when the
geometry between the input and appearance objects is significantly different. A
straightforward approach is to directly apply a 3D generative model, but we
show that this ultimately fails to produce appealing results. Instead, we
propose a principled approach inspired by universal guidance. Given a
pretrained rectified flow model conditioned on image or text, our training-free
method interacts with the sampling process by periodically adding guidance.
This guidance can be modeled as a differentiable loss function, and we
experiment with two different types of guidance including part-aware losses for
appearance and self-similarity. Our experiments show that our approach
successfully transfers texture and geometric details to the input 3D asset,
outperforming baselines both qualitatively and quantitatively. We also show
that traditional metrics are not suitable for evaluating the task due to their
inability of focusing on local details and comparing dissimilar inputs, in
absence of ground truth data. We thus evaluate appearance transfer quality with
a GPT-based system objectively ranking outputs, ensuring robust and human-like
assessment, as further confirmed by our user study. Beyond showcased scenarios,
our method is general and could be extended to different types of diffusion
models and guidance functions.