DragMesh: 誰でも簡単に操作できるインタラクティブ3D生成

要旨

生成モデルは静的3Dコンテンツの作成において優れた成果を上げてきたが、物体の動きや相互作用への応答を理解するシステムの実現は依然として根本的な課題である。関節運動に関する現在の手法は岐路に立っており、物理的に一貫性があるがリアルタイム使用には遅すぎるか、あるいは生成的ではあるが基本的な運動学的制約に違反するかのどちらかである。我々は、軽量な運動生成コアを中心に構築されたリアルタイム対話型3D関節化のための堅牢なフレームワーク「DragMesh」を提案する。我々の中核的貢献は、新しい分離型運動学推論と運動生成フレームワークである。第一に、潜在関節パラメータを、意味的意図推論（関節タイプを決定）と幾何学的回帰（Kinematics Prediction Network (KPP-Net) を用いて軸と原点を決定）を分離して推論する。第二に、剛体運動を表現するための二重四元数のコンパクト・連続・特異点フリーという特性を活用するため、新規のDual Quaternion VAE (DQ-VAE) を開発した。このDQ-VAEは、これらの予測された事前情報と元のユーザードラッグ操作を受け取り、完全で妥当な運動軌跡を生成する。運動学への厳密な準拠を保証するため、FiLM (Feature-wise Linear Modulation) 条件付けを用いて、DQ-VAEの非自己回帰型Transformerデコーダの全層に関節事前情報を注入する。この持続的かつマルチスケールなガイダンスは、数値的に安定した外積損失によって補完され、軸の整合性を保証する。この分離設計により、DragMeshはリアルタイム性能を達成し、再学習なしで新規オブジェクトに対しても妥当な生成的関節化を可能とし、生成的3D知能への実践的な一歩を提供する。コード: https://github.com/AIGeeksGroup/DragMesh. ウェブサイト: https://aigeeksgroup.github.io/DragMesh.

English

While generative models have excelled at creating static 3D content, the pursuit of systems that understand how objects move and respond to interactions remains a fundamental challenge. Current methods for articulated motion lie at a crossroads: they are either physically consistent but too slow for real-time use, or generative but violate basic kinematic constraints. We present DragMesh, a robust framework for real-time interactive 3D articulation built around a lightweight motion generation core. Our core contribution is a novel decoupled kinematic reasoning and motion generation framework. First, we infer the latent joint parameters by decoupling semantic intent reasoning (which determines the joint type) from geometric regression (which determines the axis and origin using our Kinematics Prediction Network (KPP-Net)). Second, to leverage the compact, continuous, and singularity-free properties of dual quaternions for representing rigid body motion, we develop a novel Dual Quaternion VAE (DQ-VAE). This DQ-VAE receives these predicted priors, along with the original user drag, to generate a complete, plausible motion trajectory. To ensure strict adherence to kinematics, we inject the joint priors at every layer of the DQ-VAE's non-autoregressive Transformer decoder using FiLM (Feature-wise Linear Modulation) conditioning. This persistent, multi-scale guidance is complemented by a numerically-stable cross-product loss to guarantee axis alignment. This decoupled design allows DragMesh to achieve real-time performance and enables plausible, generative articulation on novel objects without retraining, offering a practical step toward generative 3D intelligence. Code: https://github.com/AIGeeksGroup/DragMesh. Website: https://aigeeksgroup.github.io/DragMesh.

DragMesh: 誰でも簡単に操作できるインタラクティブ3D生成

DragMesh: Interactive 3D Generation Made Easy

要旨

Support