InterPrior:基于物理的人体-物体交互生成控制规模化框架
InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions
February 5, 2026
作者: Sirui Xu, Samuel Schulter, Morteza Ziyadi, Xialin He, Xiaohan Fei, Yu-Xiong Wang, Liangyan Gui
cs.AI
摘要
人类很少在显性全身运动层面上规划与物体的全身交互。高层意图(如功能可供性)定义目标,而协调的平衡、接触和操作行为能够从底层物理与运动先验中自然涌现。扩展此类先验是让人形机器人能够在保持物理连贯的全身协调的同时,在不同情境中组合并泛化移动操作技能的关键。为此,我们提出InterPrior——一个通过大规模模仿预训练和强化学习后训练来学习统一生成控制器的可扩展框架。该框架首先将全参考模仿专家蒸馏为多功能的目标条件变分策略,能够从多模态观测数据和高层意图重建运动。虽然蒸馏策略能复现训练行为,但由于大规模人机交互的配置空间庞大,其泛化可靠性不足。为此,我们采用物理扰动的数据增强技术,随后进行强化学习微调以提升对未见目标和初始状态的适应能力。这些步骤共同将重建的潜在技能整合到有效流形中,形成能够超越训练数据泛化的运动先验,例如可融入与未见物体交互等新行为。我们进一步验证了其在用户交互控制中的有效性及其在真实机器人部署中的应用潜力。
English
Humans rarely plan whole-body interactions with objects at the level of explicit whole-body movements. High-level intentions, such as affordance, define the goal, while coordinated balance, contact, and manipulation can emerge naturally from underlying physical and motor priors. Scaling such priors is key to enabling humanoids to compose and generalize loco-manipulation skills across diverse contexts while maintaining physically coherent whole-body coordination. To this end, we introduce InterPrior, a scalable framework that learns a unified generative controller through large-scale imitation pretraining and post-training by reinforcement learning. InterPrior first distills a full-reference imitation expert into a versatile, goal-conditioned variational policy that reconstructs motion from multimodal observations and high-level intent. While the distilled policy reconstructs training behaviors, it does not generalize reliably due to the vast configuration space of large-scale human-object interactions. To address this, we apply data augmentation with physical perturbations, and then perform reinforcement learning finetuning to improve competence on unseen goals and initializations. Together, these steps consolidate the reconstructed latent skills into a valid manifold, yielding a motion prior that generalizes beyond the training data, e.g., it can incorporate new behaviors such as interactions with unseen objects. We further demonstrate its effectiveness for user-interactive control and its potential for real robot deployment.