ArtLLM: 3D LLM을 통한 관절형 에셋 생성

초록

게임, 로봇공학, 시뮬레이션을 위한 인터랙티브 디지털 환경 구축은 부품 형상과 운동학적 구조에서 기능이 발현되는 관절형 3D 객체에 의존합니다. 그러나 기존 접근법은 근본적인 한계를 지닙니다: 최적화 기반 재구성 방법은 느린 객체별 조인트 피팅이 필요하며 일반적으로 단순한 단일 조인트 객체만 처리하는 반면, 검색 기반 방법은 고정된 라이브러리에서 부품을 조립하여 반복적인 형상과 낮은 일반화 성능을 초래합니다. 이러한 문제를 해결하기 위해 완전한 3D 메시에서 직접 고품질 관절형 애셋을 생성하는 새로운 프레임워크인 ArtLLM을 소개합니다. 그 핵심에는 기존 관절 데이터셋과 절차적으로 생성된 객체로부터 정제된 대규모 관절 데이터셋으로 훈련된 3D 멀티모달 대규모 언어 모델이 있습니다. 기존 연구와 달리 ArtLLM은 가변적인 수의 부품과 조인트를 자동회귀적으로 예측하며 객체의 포인트 클라우드에서 운동학적 구조를 통합적으로 추론합니다. 이 관절 인지 레이아웃은 이후 3D 생성 모델의 조건으로 작용하여 높은 정확도의 부품 형상을 합성합니다. PartNet-Mobility 데이터셋에 대한 실험 결과, ArtLLM은 부품 레이아웃 정확도와 조인트 예측 모두에서 최첨단 방법을 크게 능가하며 실제 객체에 대해 강건하게 일반화함을 보여줍니다. 마지막으로 디지털 트윈 구축에서의 유용성을 입증하여 확장 가능한 로봇 학습에 대한 잠재력을 강조합니다.

English

Creating interactive digital environments for gaming, robotics, and simulation relies on articulated 3D objects whose functionality emerges from their part geometry and kinematic structure. However, existing approaches remain fundamentally limited: optimization-based reconstruction methods require slow, per-object joint fitting and typically handle only simple, single-joint objects, while retrieval-based methods assemble parts from a fixed library, leading to repetitive geometry and poor generalization. To address these challenges, we introduce ArtLLM, a novel framework for generating high-quality articulated assets directly from complete 3D meshes. At its core is a 3D multimodal large language model trained on a large-scale articulation dataset curated from both existing articulation datasets and procedurally generated objects. Unlike prior work, ArtLLM autoregressively predicts a variable number of parts and joints, inferring their kinematic structure in a unified manner from the object's point cloud. This articulation-aware layout then conditions a 3D generative model to synthesize high-fidelity part geometries. Experiments on the PartNet-Mobility dataset show that ArtLLM significantly outperforms state-of-the-art methods in both part layout accuracy and joint prediction, while generalizing robustly to real-world objects. Finally, we demonstrate its utility in constructing digital twins, highlighting its potential for scalable robot learning.

ArtLLM: 3D LLM을 통한 관절형 에셋 생성

ArtLLM: Generating Articulated Assets via 3D LLM

초록

Support