ChatPaper.aiChatPaper

Orchard:一個開源的智能代理建模框架

Orchard: An Open-Source Agentic Modeling Framework

May 14, 2026
作者: Baolin Peng, Wenlin Yao, Qianhui Wu, Hao Cheng, Xiao Yu, Rui Yang, Tao Ge, Alessandrio Sordoni, Xingdi Yuan, Yelong Shen, Pengcheng He, Tong Zhang, Zhou Yu, Jianfeng Gao
cs.AI

摘要

代理建模旨在將大型語言模型轉化為能夠透過規劃、推理、工具使用以及與環境進行多輪互動來解決複雜任務的自主代理。儘管投入了大量資源,開源研究仍受制於基礎設施與訓練流程方面的不足。許多高效能系統依賴專有程式碼庫、模型或服務,而大多數開源框架則側重於編排與評估,而非可規模化的代理訓練。我們提出 Orchard,一個用於可規模化代理建模的開源框架。其核心為 Orchard Env,這是一個輕量級環境服務,提供可重複使用的基本元件,以管理跨任務領域、代理框架與流程階段的沙箱生命週期。在 Orchard Env 之上,我們建構了三種代理建模方案。Orchard-SWE 專注於程式碼撰寫代理。我們從 MiniMax-M2.5 與 Qwen3.5-397B 中提煉出 107K 條軌跡,引入信用分配監督式微調以從未解決軌跡的有效片段中學習,並在強化學習中採用平衡自適應展開。從 Qwen3-30B-A3B-Thinking 出發,Orchard-SWE 在監督式微調後於 SWE-bench Verified 達到 64.3%,在監督式微調加強化學習後達到 67.5%,在同等規模的開源模型中創下新的最佳成績。Orchard-GUI 僅使用 0.4K 條提煉軌跡與 2.2K 個開放式任務,訓練出一個 4B 參數的視覺語言電腦操作代理。它在 WebVoyager、Online-Mind2Web 與 DeepShop 上分別達到 74.1%、67.0% 與 64.0% 的成功率,成為最強的開源模型,同時能與專有系統競爭。Orchard-Claw 則針對個人助理代理。僅使用 0.2K 個合成任務進行訓練,它在 Claw-Eval 上達到 59.6% 的 pass@3,而與更強的 ZeroClaw 框架搭配時更達到 73.9%。這些結果共同顯示,一個輕量級、開放且與框架無關的環境層,能實現跨領域的可重複使用代理資料、訓練方案與評估。
English
Agentic modeling aims to transform LLMs into autonomous agents capable of solving complex tasks through planning, reasoning, tool use, and multi-turn interaction with environments. Despite major investment, open research remains constrained by infrastructure and training gaps. Many high-performing systems rely on proprietary codebases, models, or services, while most open-source frameworks focus on orchestration and evaluation rather than scalable agent training. We present Orchard, an open-source framework for scalable agentic modeling. At its core is Orchard Env, a lightweight environment service providing reusable primitives for sandbox lifecycle management across task domains, agent harnesses, and pipeline stages. On top of Orchard Env, we build three agentic modeling recipes. Orchard-SWE targets coding agents. We distill 107K trajectories from MiniMax-M2.5 and Qwen3.5-397B, introduce credit-assignment SFT to learn from productive segments of unresolved trajectories, and apply Balanced Adaptive Rollout for RL. Starting from Qwen3-30B-A3B-Thinking, Orchard-SWE achieves 64.3% on SWE-bench Verified after SFT and 67.5% after SFT+RL, setting a new state of the art among open-source models of comparable size. Orchard-GUI trains a 4B vision-language computer-use agent using only 0.4K distilled trajectories and 2.2K open-ended tasks. It achieves 74.1%, 67.0%, and 64.0% success rates on WebVoyager, Online-Mind2Web, and DeepShop, respectively, making it the strongest open-source model while remaining competitive with proprietary systems. Orchard-Claw targets personal assistant agents. Trained with only 0.2K synthetic tasks, it achieves 59.6% pass@3 on Claw-Eval and 73.9% when paired with a stronger ZeroClaw harness. Collectively, these results show that a lightweight, open, harness-agnostic environment layer enables reusable agentic data, training recipes, and evaluations across domains.