DragMesh: 손쉬운 인터랙티브 3D 생성

초록

생성 모델이 정적 3D 콘텐츠 생성에서는 뛰어난 성과를 보였으나, 객체의 움직임과 상호작용에 대한 반응을 이해하는 시스템을 구축하는 것은 여전히 근본적인 과제로 남아 있습니다. 현재 관절 운동을 위한 방법론은 기로에 서 있습니다: 물리적으로 일관되지만 실시간 사용에는 너무 느리거나, 생성은 가능하지만 기본적인 운동학적 제약을 위반하는 양자택일의 상황입니다. 본 논문에서는 경량화된 운동 생성 코어를 기반으로 실시간 인터랙티브 3D 관절 운동을 위한 강력한 프레임워크인 DragMesh를 제안합니다. 우리의 핵심 기여는 새로운 비결합적 운동학 추론 및 운동 생성 프레임워크입니다. 첫째, 의미론적 의도 추론(관절 타입 결정)과 기하학적 회귀(우리의 Kinematics Prediction Network (KPP-Net)를 사용하여 축과 원점 결정)를 분리하여 잠재 관절 매개변수를 추론합니다. 둘째, 강체 운동 표현을 위해 이중 사원수의 간결함, 연속성, 특이점이 없는 특성을 활용하기 위해 새로운 Dual Quaternion VAE (DQ-VAE)를 개발했습니다. 이 DQ-VAE는 예측된 사전 정보와 원본 사용자 드래그 입력을 받아 완전하고 그럴듯한 운동 궤적을 생성합니다. 운동학적 제약을 엄격히 준수시키기 위해, FiLM(Feature-wise Linear Modulation) 조건화를 사용하여 DQ-VAE의 비자회귀적 트랜스포머 디코더의 모든 계층에 관절 사전 정보를 주입합니다. 이 지속적이고 다중 스케일의 안내는 수치적으로 안정적인 외적 손실 함수로 보완되어 축 정렬을 보장합니다. 이러한 비결합 설계로 DragMesh는 실시간 성능을 달성하고, 재학습 없이도 새로운 객체에 대해 그럴듯한 생성적 관절 운동을 가능하게 하여 생성적 3D 인텔리전스 향상을 위한 실용적인 단계를 제시합니다. 코드: https://github.com/AIGeeksGroup/DragMesh. 웹사이트: https://aigeeksgroup.github.io/DragMesh.

English

While generative models have excelled at creating static 3D content, the pursuit of systems that understand how objects move and respond to interactions remains a fundamental challenge. Current methods for articulated motion lie at a crossroads: they are either physically consistent but too slow for real-time use, or generative but violate basic kinematic constraints. We present DragMesh, a robust framework for real-time interactive 3D articulation built around a lightweight motion generation core. Our core contribution is a novel decoupled kinematic reasoning and motion generation framework. First, we infer the latent joint parameters by decoupling semantic intent reasoning (which determines the joint type) from geometric regression (which determines the axis and origin using our Kinematics Prediction Network (KPP-Net)). Second, to leverage the compact, continuous, and singularity-free properties of dual quaternions for representing rigid body motion, we develop a novel Dual Quaternion VAE (DQ-VAE). This DQ-VAE receives these predicted priors, along with the original user drag, to generate a complete, plausible motion trajectory. To ensure strict adherence to kinematics, we inject the joint priors at every layer of the DQ-VAE's non-autoregressive Transformer decoder using FiLM (Feature-wise Linear Modulation) conditioning. This persistent, multi-scale guidance is complemented by a numerically-stable cross-product loss to guarantee axis alignment. This decoupled design allows DragMesh to achieve real-time performance and enables plausible, generative articulation on novel objects without retraining, offering a practical step toward generative 3D intelligence. Code: https://github.com/AIGeeksGroup/DragMesh. Website: https://aigeeksgroup.github.io/DragMesh.