FinePhys:通过显式融入物理定律实现细粒度人体动作生成,以提供有效的骨骼引导
FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance
May 19, 2025
作者: Dian Shao, Mingfei Shi, Shengda Xu, Haodong Chen, Yongle Huang, Binglu Wang
cs.AI
摘要
儘管視頻生成技術取得了顯著進展,但合成物理上合理的人類動作仍是一個持續存在的挑戰,尤其是在建模細粒度語義和複雜時間動態方面。例如,生成如“0.5轉換腿跳躍”這樣的體操動作,對現有方法提出了重大難題,往往導致不盡人意的結果。為彌合這一差距,我們提出了FinePhys,這是一個細粒度的人類動作生成框架,它結合了物理學以獲得有效的骨骼引導。具體而言,FinePhys首先以在線方式估計2D姿態,然後通過上下文學習進行2D到3D的維度提升。為緩解純數據驅動的3D姿態的不穩定性和有限的可解釋性,我們進一步引入了一個基於物理的運動重新估計模塊,該模塊由歐拉-拉格朗日方程控制,通過雙向時間更新計算關節加速度。物理預測的3D姿態隨後與數據驅動的姿態融合,為擴散過程提供多尺度的2D熱圖引導。在FineGym的三個細粒度動作子集(FX-JUMP、FX-TURN和FX-SALTO)上進行評估,FinePhys顯著優於競爭基線。全面的定性結果進一步證明了FinePhys生成更自然、更合理的細粒度人類動作的能力。
English
Despite significant advances in video generation, synthesizing physically
plausible human actions remains a persistent challenge, particularly in
modeling fine-grained semantics and complex temporal dynamics. For instance,
generating gymnastics routines such as "switch leap with 0.5 turn" poses
substantial difficulties for current methods, often yielding unsatisfactory
results. To bridge this gap, we propose FinePhys, a Fine-grained human action
generation framework that incorporates Physics to obtain effective skeletal
guidance. Specifically, FinePhys first estimates 2D poses in an online manner
and then performs 2D-to-3D dimension lifting via in-context learning. To
mitigate the instability and limited interpretability of purely data-driven 3D
poses, we further introduce a physics-based motion re-estimation module
governed by Euler-Lagrange equations, calculating joint accelerations via
bidirectional temporal updating. The physically predicted 3D poses are then
fused with data-driven ones, offering multi-scale 2D heatmap guidance for the
diffusion process. Evaluated on three fine-grained action subsets from FineGym
(FX-JUMP, FX-TURN, and FX-SALTO), FinePhys significantly outperforms
competitive baselines. Comprehensive qualitative results further demonstrate
FinePhys's ability to generate more natural and plausible fine-grained human
actions.Summary
AI-Generated Summary