하나에서 다수로: 3D 생성을 위한 문맥적 부분 잠재 변수

초록

최근 3D 생성 기술은 다중 뷰 2D 렌더링 접근법에서 지리적 데이터의 기하학적 사전 정보를 활용하는 3D 네이티브 잠재 확산 프레임워크로 전환되었습니다. 그러나 여전히 세 가지 주요 한계가 존재합니다: (1) 단일 잠재 표현은 복잡한 다중 부분 기하학을 포착하지 못해 디테일 저하를 초래합니다; (2) 전체적인 잠재 코딩은 구성적 설계에 중요한 부분 독립성과 상호 관계를 간과합니다; (3) 전역 조건화 메커니즘은 세밀한 제어 가능성이 부족합니다. 인간의 3D 설계 워크플로우에서 영감을 받아, 우리는 CoPart를 제안합니다. 이는 3D 객체를 문맥적 부분 잠재로 분해하여 일관된 다중 부분 생성을 가능하게 하는 부분 인식 확산 프레임워크입니다. 이 패러다임은 세 가지 장점을 제공합니다: i) 부분 분해를 통해 인코딩 복잡성을 줄입니다; ii) 명시적 부분 관계 모델링을 가능하게 합니다; iii) 부분 수준 조건화를 지원합니다. 우리는 또한 사전 훈련된 확산 모델을 공동 부분 잠재 노이즈 제거를 위해 미세 조정하는 상호 안내 전략을 개발하여 기하학적 일관성과 기초 모델 사전 정보를 모두 보장합니다. 대규모 훈련을 가능하게 하기 위해, 우리는 자동화된 메시 분할과 인간 검증 어노테이션을 통해 Objaverse에서 파생된 새로운 3D 부분 데이터셋인 Partverse를 구축했습니다. 광범위한 실험을 통해 CoPart가 부분 수준 편집, 관절 객체 생성, 그리고 전례 없는 제어 가능성을 가진 장면 구성에서 우수한 능력을 보여줌을 입증했습니다.

English

Recent advances in 3D generation have transitioned from multi-view 2D rendering approaches to 3D-native latent diffusion frameworks that exploit geometric priors in ground truth data. Despite progress, three key limitations persist: (1) Single-latent representations fail to capture complex multi-part geometries, causing detail degradation; (2) Holistic latent coding neglects part independence and interrelationships critical for compositional design; (3) Global conditioning mechanisms lack fine-grained controllability. Inspired by human 3D design workflows, we propose CoPart - a part-aware diffusion framework that decomposes 3D objects into contextual part latents for coherent multi-part generation. This paradigm offers three advantages: i) Reduces encoding complexity through part decomposition; ii) Enables explicit part relationship modeling; iii) Supports part-level conditioning. We further develop a mutual guidance strategy to fine-tune pre-trained diffusion models for joint part latent denoising, ensuring both geometric coherence and foundation model priors. To enable large-scale training, we construct Partverse - a novel 3D part dataset derived from Objaverse through automated mesh segmentation and human-verified annotations. Extensive experiments demonstrate CoPart's superior capabilities in part-level editing, articulated object generation, and scene composition with unprecedented controllability.