CoDA:面向全身操控铰接物体的协同扩散噪声优化
CoDA: Coordinated Diffusion Noise Optimization for Whole-Body Manipulation of Articulated Objects
May 27, 2025
作者: Huaijin Pi, Zhi Cen, Zhiyang Dou, Taku Komura
cs.AI
摘要
合成包括身体运动、手部运动和物体运动在内的全身操控关节物体,是虚拟人及机器人领域中一项至关重要却极具挑战性的任务。其核心挑战主要来自两方面。首先,实现逼真的全身运动需要手部与身体其他部位紧密协调,因为在操控过程中它们的动作相互依存。其次,关节物体的操控通常涉及高自由度,并需更高的精确度,往往要求手指精确放置于特定区域以驱动可动部件。为应对这些挑战,我们提出了一种新颖的协调扩散噪声优化框架。具体而言,我们在三个专门针对身体、左手和右手的扩散模型上进行噪声空间优化,每个模型均基于其自身的运动数据集训练,以提升泛化能力。通过沿人体运动链的梯度流,协调自然形成,使得全局身体姿态能够高保真地响应手部运动目标。为进一步增强手与物体交互的精确度,我们采用了基于基础点集(BPS)的统一表示方法,其中末端执行器的位置被编码为与用于物体几何的同一BPS之间的距离。这一统一表示捕捉了手与关节物体部件之间细粒度的空间关系,生成的轨迹作为目标引导扩散噪声的优化,从而产生高度精确的交互运动。我们进行了大量实验,结果表明,我们的方法在运动质量和物理合理性上均优于现有技术,并支持多种功能,如物体姿态控制、行走与操控同步执行,以及仅凭手部数据生成全身动作。
English
Synthesizing whole-body manipulation of articulated objects, including body
motion, hand motion, and object motion, is a critical yet challenging task with
broad applications in virtual humans and robotics. The core challenges are
twofold. First, achieving realistic whole-body motion requires tight
coordination between the hands and the rest of the body, as their movements are
interdependent during manipulation. Second, articulated object manipulation
typically involves high degrees of freedom and demands higher precision, often
requiring the fingers to be placed at specific regions to actuate movable
parts. To address these challenges, we propose a novel coordinated diffusion
noise optimization framework. Specifically, we perform noise-space optimization
over three specialized diffusion models for the body, left hand, and right
hand, each trained on its own motion dataset to improve generalization.
Coordination naturally emerges through gradient flow along the human kinematic
chain, allowing the global body posture to adapt in response to hand motion
objectives with high fidelity. To further enhance precision in hand-object
interaction, we adopt a unified representation based on basis point sets (BPS),
where end-effector positions are encoded as distances to the same BPS used for
object geometry. This unified representation captures fine-grained spatial
relationships between the hand and articulated object parts, and the resulting
trajectories serve as targets to guide the optimization of diffusion noise,
producing highly accurate interaction motion. We conduct extensive experiments
demonstrating that our method outperforms existing approaches in motion quality
and physical plausibility, and enables various capabilities such as object pose
control, simultaneous walking and manipulation, and whole-body generation from
hand-only data.Summary
AI-Generated Summary