JanusMesh: 교차 공간 잡음 제거를 통한 빠르고 제로샷 3D 시각 착시 생성

초록

3D 시각적 착시를 생성하는 것, 즉 다양한 관점에서 완전히 다른 의미를 드러내는 단일 3D 메시를 만드는 것은 매력적이면서도 까다로운 도전 과제입니다. 기존의 최적화 기반 방법은 속도가 느리고 과포화된 색상을 생성할 수 있습니다. 반면, 단순한 이어붙이기 방식은 기하학적으로 일관된 객체를 생성하지 못하여 눈에 띄는 부자연스러운 이음새와 의미 누출을 초래합니다. 본 논문에서는 텍스트 기반 3D 시각적 착시를 생성하기 위한 빠르고 훈련이 필요 없는 프레임워크를 제시합니다. 우리의 접근 방식은 생성을 두 단계로 분리합니다. 첫째, 교차 공간 이중 가지 잡음 제거 과정을 제안합니다. 이 과정은 3D 잠재 변수를 복셀 공간으로 동적으로 디코딩하여 CLIP 기반 방향 정렬 및 부호 거리 필드(SDF) 블렌딩을 수행하며, 이를 통해 매끄러운 기하학적 융합을 보장합니다. 둘째, 융합된 기하 구조에 시점별 2D 확산 사전 정보를 투영하고 집계하는 시점 조건부 텍스처 합성 모듈을 도입합니다. 광범위한 실험을 통해 우리의 방법이 단 3~5분 만에 매우 사실적이고 이중 의미를 가진 3D 착시를 생성함을 입증했습니다. 이 방법은 기존 방법들에 비해 기하학적 완전성, 의미 인식 가능성, 효율성 측면에서 현저히 뛰어납니다. 프로젝트 페이지: https://siang1105.github.io/JanusMesh.github.io/

English

Creating 3D visual illusions, a single 3D mesh that reveals entirely different semantics from various viewing angles, is a fascinating but tough challenge. Existing optimization-based methods are slow and can produce oversaturated colors. In contrast, naive stitching approaches fail to produce geometrically coherent objects. This results in visible unnatural seams and semantic leaks. In this paper, we present a fast and training-free framework for generating text-driven 3D visual illusions. Our approach decouples the generation into two stages. First, we propose a cross-space dual-branch denoising process. This process dynamically decodes 3D latents into voxel space for CLIP-guided orientation alignment and Signed Distance Field (SDF) blending, which ensures seamless geometric fusion. Second, we introduce a view-conditioned texture synthesis module that projects and aggregates view-specific 2D diffusion priors onto the fused geometry. Extensive experiments demonstrate that our method generates highly realistic, dual-semantic 3D illusions in just 3-5 minutes. It significantly outperforms existing methods in geometric integrity, semantic recognizability, and efficiency. Project page: https://siang1105.github.io/JanusMesh.github.io/