CODA：大脳と小脳を連携させたデュアルブレインコンピュータデカップルド強化学習を用いたエージェント

要旨

グラフィカルユーザーインターフェース（GUI）向けの自律エージェントは、科学計算などの専門領域において、長期的な計画と精密な実行の両方が求められるため、大きな課題に直面しています。既存のアプローチでは、汎用エージェントは計画立案に優れるものの実行が不十分であり、専門エージェントはその逆の弱点を示すというトレードオフが存在します。最近の構成論的フレームワークは、プランナーとアクターを組み合わせることでこのギャップを埋めようとしていますが、これらは通常静的で学習不可能であり、経験からの適応が妨げられています。これは、科学領域における高品質なデータの不足を考えると重大な制約です。これらの制約に対処するため、我々はCODAという新しい学習可能な構成論的フレームワークを提案します。CODAは、汎用プランナー（Cerebrum）と専門エグゼキューター（Cerebellum）を統合し、専用の2段階パイプラインを通じて学習されます。第1段階の「専門化」では、分離されたGRPOアプローチを適用し、各科学アプリケーションに対して個別に専門プランナーを学習させ、少数のタスク軌跡からブートストラップします。第2段階の「一般化」では、専門家から得られたすべての成功軌跡を集約し、統合データセットを構築します。このデータセットは、最終プランナーの教師ありファインチューニングに使用されます。これにより、CODAは堅牢な実行能力とクロスドメイン汎化能力を兼ね備えます。ScienceBoardベンチマークの4つの挑戦的なアプリケーションで評価された結果、CODAはベースラインを大幅に上回り、オープンソースモデルの中で新たな最先端を確立しました。

English

Autonomous agents for Graphical User Interfaces (GUIs) face significant challenges in specialized domains such as scientific computing, where both long-horizon planning and precise execution are required. Existing approaches suffer from a trade-off: generalist agents excel at planning but perform poorly in execution, while specialized agents demonstrate the opposite weakness. Recent compositional frameworks attempt to bridge this gap by combining a planner and an actor, but they are typically static and non-trainable, which prevents adaptation from experience. This is a critical limitation given the scarcity of high-quality data in scientific domains. To address these limitations, we introduce CODA, a novel and trainable compositional framework that integrates a generalist planner (Cerebrum) with a specialist executor (Cerebellum), trained via a dedicated two-stage pipeline. In the first stage, Specialization, we apply a decoupled GRPO approach to train an expert planner for each scientific application individually, bootstrapping from a small set of task trajectories. In the second stage, Generalization, we aggregate all successful trajectories from the specialized experts to build a consolidated dataset, which is then used for supervised fine-tuning of the final planner. This equips CODA with both robust execution and cross-domain generalization. Evaluated on four challenging applications from the ScienceBoard benchmark, CODA significantly outperforms baselines and establishes a new state of the art among open-source models.

CODA：大脳と小脳を連携させたデュアルブレインコンピュータデカップルド強化学習を用いたエージェント

CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

要旨

Support