人形机器人视觉驱动反应式足球技能学习

摘要

人形足球是具身智能领域的一项代表性挑战，要求机器人在紧密耦合的感知-行动循环中运作。然而现有系统通常依赖解耦模块，导致动态环境中出现响应延迟与行为失协，而现实世界的感知局限更使这些问题加剧。本研究提出一种基于强化学习的统一控制器，通过视觉感知与运动控制的直接集成，使人形机器人获得反应式足球技能。我们的方法将对抗性运动先验扩展至现实动态环境中的感知场景，搭建起运动模仿与视觉驱动的动态控制之间的桥梁。我们引入结合虚拟感知系统的编码器-解码器架构，该系统能模拟真实世界的视觉特性，使策略能够从不完美观测中恢复特权状态，并建立感知与行动的主动协同。最终实现的控制器展现出强大的反应能力，能在包括真实RoboCup比赛在内的多种场景中持续执行协调一致的鲁棒性足球行为。

English

Humanoid soccer poses a representative challenge for embodied intelligence, requiring robots to operate within a tightly coupled perception-action loop. However, existing systems typically rely on decoupled modules, resulting in delayed responses and incoherent behaviors in dynamic environments, while real-world perceptual limitations further exacerbate these issues. In this work, we present a unified reinforcement learning-based controller that enables humanoid robots to acquire reactive soccer skills through the direct integration of visual perception and motion control. Our approach extends Adversarial Motion Priors to perceptual settings in real-world dynamic environments, bridging motion imitation and visually grounded dynamic control. We introduce an encoder-decoder architecture combined with a virtual perception system that models real-world visual characteristics, allowing the policy to recover privileged states from imperfect observations and establish active coordination between perception and action. The resulting controller demonstrates strong reactivity, consistently executing coherent and robust soccer behaviors across various scenarios, including real RoboCup matches.