HyCodePolicy：面向具身代理的多模态监控与决策的混合语言控制器

摘要

近期多模態大型語言模型（MLLMs）的進展，為具身代理中的代碼策略生成提供了更豐富的感知基礎。然而，現有系統大多缺乏有效機制來適應性地監控策略執行並在任務完成過程中修復代碼。在本研究中，我們提出了HyCodePolicy，這是一種基於混合語言的控製框架，它系統性地將代碼合成、幾何基礎、感知監控和迭代修復整合到具身代理的閉環編程循環中。技術上，給定一個自然語言指令，我們的系統首先將其分解為子目標，並生成一個基於對象中心幾何原語的初始可執行程序。該程序隨後在模擬中執行，同時視覺語言模型（VLM）觀察選定的檢查點以檢測和定位執行失敗並推斷失敗原因。通過融合捕捉程序級事件的結構化執行軌跡與基於VLM的感知反饋，HyCodePolicy推斷失敗原因並修復程序。這種混合雙重反饋機制實現了在最小人力監督下的自我校正程序合成。我們的結果表明，HyCodePolicy顯著提高了機器人操作策略的魯棒性和樣本效率，為將多模態推理整合到自主決策管道中提供了一種可擴展的策略。

English

Recent advances in multimodal large language models (MLLMs) have enabled richer perceptual grounding for code policy generation in embodied agents. However, most existing systems lack effective mechanisms to adaptively monitor policy execution and repair codes during task completion. In this work, we introduce HyCodePolicy, a hybrid language-based control framework that systematically integrates code synthesis, geometric grounding, perceptual monitoring, and iterative repair into a closed-loop programming cycle for embodied agents. Technically, given a natural language instruction, our system first decomposes it into subgoals and generates an initial executable program grounded in object-centric geometric primitives. The program is then executed in simulation, while a vision-language model (VLM) observes selected checkpoints to detect and localize execution failures and infer failure reasons. By fusing structured execution traces capturing program-level events with VLM-based perceptual feedback, HyCodePolicy infers failure causes and repairs programs. This hybrid dual feedback mechanism enables self-correcting program synthesis with minimal human supervision. Our results demonstrate that HyCodePolicy significantly improves the robustness and sample efficiency of robot manipulation policies, offering a scalable strategy for integrating multimodal reasoning into autonomous decision-making pipelines.

HyCodePolicy：面向具身代理的多模态监控与决策的混合语言控制器

HyCodePolicy: Hybrid Language Controllers for Multimodal Monitoring and Decision in Embodied Agents

摘要

Support