Hunyuan3D-Omni: 3D 자산의 제어 가능한 생성을 위한 통합 프레임워크

초록

3D 네이티브 생성 모델의 최근 발전은 게임, 영화, 디자인을 위한 자산 생성 속도를 가속화했습니다. 그러나 대부분의 방법은 여전히 주로 이미지나 텍스트 조건에 의존하며, 세밀한 교차 모달 제어가 부족하여 제어 가능성과 실제 적용이 제한됩니다. 이러한 격차를 해결하기 위해, 우리는 Hunyuan3D 2.1을 기반으로 한 세밀하고 제어 가능한 3D 자산 생성을 위한 통합 프레임워크인 Hunyuan3D-Omni를 제시합니다. Hunyuan3D-Omni는 이미지 외에도 포인트 클라우드, 복셀, 바운딩 박스, 스켈레톤 포즈 프라이어를 조건 신호로 받아들여 기하학, 토폴로지, 포즈에 대한 정밀한 제어를 가능하게 합니다. 각 모달리티에 대해 별도의 헤드를 사용하는 대신, 우리의 모델은 모든 신호를 단일 교차 모달 아키텍처로 통합합니다. 우리는 예제당 하나의 제어 모달리티를 선택하고 더 어려운 신호(예: 스켈레톤 포즈)를 선호하면서 더 쉬운 신호(예: 포인트 클라우드)의 가중치를 낮추는 점진적, 난이도 인식 샘플링 전략으로 학습하여 강력한 다중 모달 융합과 누락된 입력의 우아한 처리를 장려합니다. 실험 결과, 이러한 추가 제어는 생성 정확도를 향상시키고, 기하학 인식 변환을 가능하게 하며, 생산 워크플로우의 견고성을 증가시킵니다.

English

Recent advances in 3D-native generative models have accelerated asset creation for games, film, and design. However, most methods still rely primarily on image or text conditioning and lack fine-grained, cross-modal controls, which limits controllability and practical adoption. To address this gap, we present Hunyuan3D-Omni, a unified framework for fine-grained, controllable 3D asset generation built on Hunyuan3D 2.1. In addition to images, Hunyuan3D-Omni accepts point clouds, voxels, bounding boxes, and skeletal pose priors as conditioning signals, enabling precise control over geometry, topology, and pose. Instead of separate heads for each modality, our model unifies all signals in a single cross-modal architecture. We train with a progressive, difficulty-aware sampling strategy that selects one control modality per example and biases sampling toward harder signals (e.g., skeletal pose) while downweighting easier ones (e.g., point clouds), encouraging robust multi-modal fusion and graceful handling of missing inputs. Experiments show that these additional controls improve generation accuracy, enable geometry-aware transformations, and increase robustness for production workflows.

Hunyuan3D-Omni: 3D 자산의 제어 가능한 생성을 위한 통합 프레임워크

Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets

초록

Support