ChatPaper.aiChatPaper

CoDA:協調擴散噪聲優化技術應用於關節物體全身操控

CoDA: Coordinated Diffusion Noise Optimization for Whole-Body Manipulation of Articulated Objects

May 27, 2025
作者: Huaijin Pi, Zhi Cen, Zhiyang Dou, Taku Komura
cs.AI

摘要

合成包括身體動作、手部動作和物體運動在內的全身操控關節物體,是一項關鍵且具有挑戰性的任務,在虛擬人體和機器人領域有著廣泛的應用。核心挑戰主要有兩方面。首先,實現逼真的全身運動需要手部與身體其他部位之間的緊密協調,因為在操控過程中它們的運動是相互依存的。其次,關節物體的操控通常涉及高自由度,並要求更高的精確度,往往需要將手指放置在特定區域以驅動可動部件。為應對這些挑戰,我們提出了一種新穎的協調擴散噪聲優化框架。具體而言,我們在三個專用的擴散模型上進行噪聲空間優化,分別針對身體、左手和右手,每個模型都在其自身的運動數據集上訓練以提高泛化能力。協調性通過沿人體運動鏈的梯度流自然產生,使得全局身體姿態能夠高保真地適應手部運動目標。為了進一步增強手物交互的精確度,我們採用了基於基點集(BPS)的統一表示法,其中末端執行器的位置被編碼為與用於物體幾何的相同BPS的距離。這種統一表示法捕捉了手部與關節物體部件之間的細粒度空間關係,生成的軌跡作為目標來指導擴散噪聲的優化,從而產生高度精確的交互運動。我們進行了大量實驗,證明我們的方法在運動質量和物理合理性上優於現有方法,並支持多種能力,如物體姿態控制、同時行走與操控,以及僅從手部數據生成全身動作。
English
Synthesizing whole-body manipulation of articulated objects, including body motion, hand motion, and object motion, is a critical yet challenging task with broad applications in virtual humans and robotics. The core challenges are twofold. First, achieving realistic whole-body motion requires tight coordination between the hands and the rest of the body, as their movements are interdependent during manipulation. Second, articulated object manipulation typically involves high degrees of freedom and demands higher precision, often requiring the fingers to be placed at specific regions to actuate movable parts. To address these challenges, we propose a novel coordinated diffusion noise optimization framework. Specifically, we perform noise-space optimization over three specialized diffusion models for the body, left hand, and right hand, each trained on its own motion dataset to improve generalization. Coordination naturally emerges through gradient flow along the human kinematic chain, allowing the global body posture to adapt in response to hand motion objectives with high fidelity. To further enhance precision in hand-object interaction, we adopt a unified representation based on basis point sets (BPS), where end-effector positions are encoded as distances to the same BPS used for object geometry. This unified representation captures fine-grained spatial relationships between the hand and articulated object parts, and the resulting trajectories serve as targets to guide the optimization of diffusion noise, producing highly accurate interaction motion. We conduct extensive experiments demonstrating that our method outperforms existing approaches in motion quality and physical plausibility, and enables various capabilities such as object pose control, simultaneous walking and manipulation, and whole-body generation from hand-only data.
PDF202June 2, 2025