波尼玛特:展开互动姿态以实现多样化人机交互动画
Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation
October 16, 2025
作者: Shaowei Liu, Chuan Guo, Bing Zhou, Jian Wang
cs.AI
摘要
近距離人際互動姿態傳遞了豐富的互動動態情境信息。基於此類姿態,人類能夠直觀地推斷情境,並借助對人類行為的強先驗知識,預測可能的過去與未來動態。受此啟發,我們提出了Ponimator,這是一個以近距離互動姿態為基礎的通用互動動畫框架。我們的訓練數據來自於運動捕捉互動數據集中的緊密接觸雙人姿態及其周圍的時間上下文。利用互動姿態先驗,Ponimator採用了兩個條件擴散模型:(1) 姿態動畫器,利用時間先驗從互動姿態生成動態運動序列;(2) 姿態生成器,應用空間先驗,在互動姿態不可用時,從單一姿態、文本或兩者結合中合成互動姿態。總體而言,Ponimator支持多樣化任務,包括基於圖像的互動動畫、反應動畫及文本到互動的合成,促進了高質量動捕數據向開放世界場景的互動知識遷移。跨多樣數據集與應用的實證實驗展示了姿態先驗的普適性及我們框架的有效性與魯棒性。
English
Close-proximity human-human interactive poses convey rich contextual
information about interaction dynamics. Given such poses, humans can
intuitively infer the context and anticipate possible past and future dynamics,
drawing on strong priors of human behavior. Inspired by this observation, we
propose Ponimator, a simple framework anchored on proximal interactive poses
for versatile interaction animation. Our training data consists of
close-contact two-person poses and their surrounding temporal context from
motion-capture interaction datasets. Leveraging interactive pose priors,
Ponimator employs two conditional diffusion models: (1) a pose animator that
uses the temporal prior to generate dynamic motion sequences from interactive
poses, and (2) a pose generator that applies the spatial prior to synthesize
interactive poses from a single pose, text, or both when interactive poses are
unavailable. Collectively, Ponimator supports diverse tasks, including
image-based interaction animation, reaction animation, and text-to-interaction
synthesis, facilitating the transfer of interaction knowledge from high-quality
mocap data to open-world scenarios. Empirical experiments across diverse
datasets and applications demonstrate the universality of the pose prior and
the effectiveness and robustness of our framework.