ChatPaper.aiChatPaper

ExoActor:以離心視角影片生成作為可泛化互動人形控制框架

ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control

April 30, 2026
作者: Yanghao Zhou, Jingyu Ma, Yibo Peng, Zhenguo Sun, Yu Bai, Börje F. Karlsson
cs.AI

摘要

近年來,人形機器人控制系統雖取得顯著進展,但對機器人、周邊環境及任務相關物件之間流暢且富含互動的行為建模,仍是根本性挑戰。此難題源於需要大規模聯合捕捉空間情境、時序動態、機器人動作與任務意圖,而傳統監督方法對此難以適用。我們提出ExoActor——一個創新框架,透過利用大規模影片生成模型的泛化能力來解決此問題。ExoActor的核心洞見在於將第三人稱影片生成作為建模互動動態的統一介面。在給定任務指令與場景情境後,ExoActor能合成隱含編碼機器人、環境與物件間協調互動的合理執行過程,隨後透過估計人體運動並透過通用運動控制器執行的流程,將此類影片輸出轉化為可執行的人形行為,最終生成任務條件化的行為序列。為驗證所提框架,我們實作了一套端到端系統,展示其無需額外真實世界數據收集即可泛化至新場景的能力。最後,我們討論當前實作的局限性並指出未來研究的可行方向,闡明ExoActor如何為富含互動的人形行為建模提供可擴展方案,有望為生成模型推動通用人形智慧發展開闢新途徑。
English
Humanoid control systems have made significant progress in recent years, yet modeling fluent interaction-rich behavior between a robot, its surrounding environment, and task-relevant objects remains a fundamental challenge. This difficulty arises from the need to jointly capture spatial context, temporal dynamics, robot actions, and task intent at scale, which is a poor match to conventional supervision. We propose ExoActor, a novel framework that leverages the generalization capabilities of large-scale video generation models to address this problem. The key insight in ExoActor is to use third-person video generation as a unified interface for modeling interaction dynamics. Given a task instruction and scene context, ExoActor synthesizes plausible execution processes that implicitly encode coordinated interactions between robot, environment, and objects. Such video output is then transformed into executable humanoid behaviors through a pipeline that estimates human motion and executes it via a general motion controller, yielding a task-conditioned behavior sequence. To validate the proposed framework, we implement it as an end-to-end system and demonstrate its generalization to new scenarios without additional real-world data collection. Furthermore, we conclude by discussing limitations of the current implementation and outlining promising directions for future research, illustrating how ExoActor provides a scalable approach to modeling interaction-rich humanoid behaviors, potentially opening a new avenue for generative models to advance general-purpose humanoid intelligence.
PDF312May 2, 2026