JanusMesh: 通过跨空间去噪实现快速零样本3D视觉幻觉生成

摘要

创建3D视觉幻觉——即单个3D网格从不同视角呈现截然不同的语义——是一项迷人且极具挑战的任务。现有基于优化的方法速度缓慢，且容易产生过饱和的色彩。相比之下，简单的拼接方法无法生成几何一致的物体，导致出现明显不自然的接缝和语义泄露。本文提出一种快速且无需训练的文本驱动3D视觉幻觉生成框架。我们的方法将生成过程解耦为两个阶段。首先，我们提出跨空间双分支去噪过程：该过程动态地将3D潜在特征解码到体素空间中，用于CLIP引导的方向对齐和符号距离场融合，从而确保无缝的几何融合。其次，我们引入视角条件纹理合成模块，将各视角特定的2D扩散先验投影并聚合到融合后的几何结构上。大量实验表明，我们的方法仅需3-5分钟即可生成高度真实、具有双语义的3D幻觉，在几何完整性、语义可识别性和效率方面显著优于现有方法。项目页面：https://siang1105.github.io/JanusMesh.github.io/

English

Creating 3D visual illusions, a single 3D mesh that reveals entirely different semantics from various viewing angles, is a fascinating but tough challenge. Existing optimization-based methods are slow and can produce oversaturated colors. In contrast, naive stitching approaches fail to produce geometrically coherent objects. This results in visible unnatural seams and semantic leaks. In this paper, we present a fast and training-free framework for generating text-driven 3D visual illusions. Our approach decouples the generation into two stages. First, we propose a cross-space dual-branch denoising process. This process dynamically decodes 3D latents into voxel space for CLIP-guided orientation alignment and Signed Distance Field (SDF) blending, which ensures seamless geometric fusion. Second, we introduce a view-conditioned texture synthesis module that projects and aggregates view-specific 2D diffusion priors onto the fused geometry. Extensive experiments demonstrate that our method generates highly realistic, dual-semantic 3D illusions in just 3-5 minutes. It significantly outperforms existing methods in geometric integrity, semantic recognizability, and efficiency. Project page: https://siang1105.github.io/JanusMesh.github.io/