CLEA：動的環境におけるタスク実行を強化するための閉ループ型エンボディエージェント

要旨

大規模言語モデル（LLM）は、意味論的推論を通じて複雑なタスクの階層的分解において顕著な能力を示します。しかし、エンボディドシステムへの応用においては、サブタスクシーケンスの信頼性ある実行と、長期的なタスク完了におけるワンショット成功を確保する上で課題があります。これらの制約を動的環境において解決するため、我々はClosed-Loop Embodied Agent（CLEA）を提案します。CLEAは、機能分離を伴う4つの専門化されたオープンソースLLMを統合した新たなアーキテクチャであり、閉ループタスク管理を実現します。このフレームワークは、2つの核心的な革新を特徴とします：(1) 環境メモリに基づいて実行可能なサブタスクを動的に生成するインタラクティブタスクプランナー、(2) アクションの実現可能性を確率的に評価するマルチモーダル実行批評家であり、環境擾乱が事前設定された閾値を超えた場合に階層的再計画メカニズムをトリガーします。CLEAの有効性を検証するため、操作可能な物体を備えた実環境において、2つの異種ロボットを用いた物体探索、操作、および探索-操作統合タスクの実験を行いました。12回のタスク試行において、CLEAはベースラインモデルを上回り、成功率で67.3%、タスク完了率で52.8%の向上を達成しました。これらの結果は、CLEAが動的環境におけるタスク計画と実行の堅牢性を大幅に向上させることを示しています。

English

Large Language Models (LLMs) exhibit remarkable capabilities in the hierarchical decomposition of complex tasks through semantic reasoning. However, their application in embodied systems faces challenges in ensuring reliable execution of subtask sequences and achieving one-shot success in long-term task completion. To address these limitations in dynamic environments, we propose Closed-Loop Embodied Agent (CLEA) -- a novel architecture incorporating four specialized open-source LLMs with functional decoupling for closed-loop task management. The framework features two core innovations: (1) Interactive task planner that dynamically generates executable subtasks based on the environmental memory, and (2) Multimodal execution critic employing an evaluation framework to conduct a probabilistic assessment of action feasibility, triggering hierarchical re-planning mechanisms when environmental perturbations exceed preset thresholds. To validate CLEA's effectiveness, we conduct experiments in a real environment with manipulable objects, using two heterogeneous robots for object search, manipulation, and search-manipulation integration tasks. Across 12 task trials, CLEA outperforms the baseline model, achieving a 67.3% improvement in success rate and a 52.8% increase in task completion rate. These results demonstrate that CLEA significantly enhances the robustness of task planning and execution in dynamic environments.

CLEA：動的環境におけるタスク実行を強化するための閉ループ型エンボディエージェント

CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic Environments

要旨

Support