FinePhys:通过显式融入物理定律实现精细人体动作生成,为骨骼引导提供有效支持
FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance
May 19, 2025
作者: Dian Shao, Mingfei Shi, Shengda Xu, Haodong Chen, Yongle Huang, Binglu Wang
cs.AI
摘要
尽管视频生成技术取得了显著进展,但合成物理上合理的人类动作仍是一个持续存在的挑战,尤其是在建模细粒度语义和复杂时间动态方面。例如,生成诸如“转体0.5周的交换跳”这样的体操动作,对现有方法提出了巨大困难,往往导致不尽人意的结果。为弥合这一差距,我们提出了FinePhys,一个融合物理学的细粒度人体动作生成框架,旨在获得有效的骨骼引导。具体而言,FinePhys首先以在线方式估计2D姿态,随后通过上下文学习实现2D到3D的维度提升。为了缓解纯数据驱动3D姿态的不稳定性和有限可解释性,我们进一步引入了一个基于物理的运动重估计模块,该模块由欧拉-拉格朗日方程控制,通过双向时间更新计算关节加速度。物理预测的3D姿态随后与数据驱动的姿态融合,为扩散过程提供多尺度2D热图指导。在FineGym的三个细粒度动作子集(FX-JUMP、FX-TURN和FX-SALTO)上的评估显示,FinePhys显著超越了竞争基线。全面的定性结果进一步证明了FinePhys在生成更自然、更合理的细粒度人体动作方面的能力。
English
Despite significant advances in video generation, synthesizing physically
plausible human actions remains a persistent challenge, particularly in
modeling fine-grained semantics and complex temporal dynamics. For instance,
generating gymnastics routines such as "switch leap with 0.5 turn" poses
substantial difficulties for current methods, often yielding unsatisfactory
results. To bridge this gap, we propose FinePhys, a Fine-grained human action
generation framework that incorporates Physics to obtain effective skeletal
guidance. Specifically, FinePhys first estimates 2D poses in an online manner
and then performs 2D-to-3D dimension lifting via in-context learning. To
mitigate the instability and limited interpretability of purely data-driven 3D
poses, we further introduce a physics-based motion re-estimation module
governed by Euler-Lagrange equations, calculating joint accelerations via
bidirectional temporal updating. The physically predicted 3D poses are then
fused with data-driven ones, offering multi-scale 2D heatmap guidance for the
diffusion process. Evaluated on three fine-grained action subsets from FineGym
(FX-JUMP, FX-TURN, and FX-SALTO), FinePhys significantly outperforms
competitive baselines. Comprehensive qualitative results further demonstrate
FinePhys's ability to generate more natural and plausible fine-grained human
actions.Summary
AI-Generated Summary