ChatPaper.aiChatPaper

ExoActor:以离焦视角视频生成为通用化交互式人形控制

ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control

April 30, 2026
作者: Yanghao Zhou, Jingyu Ma, Yibo Peng, Zhenguo Sun, Yu Bai, Börje F. Karlsson
cs.AI

摘要

近年来,人形机器人控制系统已取得显著进展,但如何流畅地建模机器人与其周围环境及任务相关物体之间富含交互的行为,仍是一个根本性挑战。这一难题源于需要大规模联合捕捉空间上下文、时序动态、机器人动作和任务意图,而传统监督方法对此难以适用。我们提出ExoActor这一新型框架,通过利用大规模视频生成模型的泛化能力来解决该问题。ExoActor的核心思想是将第三人称视频生成作为建模交互动态的统一接口:给定任务指令和场景上下文,该框架能合成隐含编码机器人、环境与物体间协同交互的合理执行过程。此类视频输出随后通过估计人体运动并经由通用运动控制器执行的流水线,转化为可执行的人形行为,最终生成任务条件化的行为序列。为验证所提框架,我们实现了端到端系统,并证明其无需额外真实世界数据收集即可泛化至新场景。此外,我们通过讨论当前实现的局限性并勾勒未来研究的可行方向,阐明ExoActor如何为富含交互的人形行为建模提供可扩展方案,有望为生成模型推动通用人形智能发展开辟新途径。
English
Humanoid control systems have made significant progress in recent years, yet modeling fluent interaction-rich behavior between a robot, its surrounding environment, and task-relevant objects remains a fundamental challenge. This difficulty arises from the need to jointly capture spatial context, temporal dynamics, robot actions, and task intent at scale, which is a poor match to conventional supervision. We propose ExoActor, a novel framework that leverages the generalization capabilities of large-scale video generation models to address this problem. The key insight in ExoActor is to use third-person video generation as a unified interface for modeling interaction dynamics. Given a task instruction and scene context, ExoActor synthesizes plausible execution processes that implicitly encode coordinated interactions between robot, environment, and objects. Such video output is then transformed into executable humanoid behaviors through a pipeline that estimates human motion and executes it via a general motion controller, yielding a task-conditioned behavior sequence. To validate the proposed framework, we implement it as an end-to-end system and demonstrate its generalization to new scenarios without additional real-world data collection. Furthermore, we conclude by discussing limitations of the current implementation and outlining promising directions for future research, illustrating how ExoActor provides a scalable approach to modeling interaction-rich humanoid behaviors, potentially opening a new avenue for generative models to advance general-purpose humanoid intelligence.
PDF312May 2, 2026