Orchard：一個開源的智能代理建模框架

摘要

代理建模旨在將大型語言模型轉化為能夠透過規劃、推理、工具使用以及與環境進行多輪互動來解決複雜任務的自主代理。儘管投入了大量資源，開源研究仍受制於基礎設施與訓練流程方面的不足。許多高效能系統依賴專有程式碼庫、模型或服務，而大多數開源框架則側重於編排與評估，而非可規模化的代理訓練。我們提出 Orchard，一個用於可規模化代理建模的開源框架。其核心為 Orchard Env，這是一個輕量級環境服務，提供可重複使用的基本元件，以管理跨任務領域、代理框架與流程階段的沙箱生命週期。在 Orchard Env 之上，我們建構了三種代理建模方案。Orchard-SWE 專注於程式碼撰寫代理。我們從 MiniMax-M2.5 與 Qwen3.5-397B 中提煉出 107K 條軌跡，引入信用分配監督式微調以從未解決軌跡的有效片段中學習，並在強化學習中採用平衡自適應展開。從 Qwen3-30B-A3B-Thinking 出發，Orchard-SWE 在監督式微調後於 SWE-bench Verified 達到 64.3%，在監督式微調加強化學習後達到 67.5%，在同等規模的開源模型中創下新的最佳成績。Orchard-GUI 僅使用 0.4K 條提煉軌跡與 2.2K 個開放式任務，訓練出一個 4B 參數的視覺語言電腦操作代理。它在 WebVoyager、Online-Mind2Web 與 DeepShop 上分別達到 74.1%、67.0% 與 64.0% 的成功率，成為最強的開源模型，同時能與專有系統競爭。Orchard-Claw 則針對個人助理代理。僅使用 0.2K 個合成任務進行訓練，它在 Claw-Eval 上達到 59.6% 的 pass@3，而與更強的 ZeroClaw 框架搭配時更達到 73.9%。這些結果共同顯示，一個輕量級、開放且與框架無關的環境層，能實現跨領域的可重複使用代理資料、訓練方案與評估。

English

Agentic modeling aims to transform LLMs into autonomous agents capable of solving complex tasks through planning, reasoning, tool use, and multi-turn interaction with environments. Despite major investment, open research remains constrained by infrastructure and training gaps. Many high-performing systems rely on proprietary codebases, models, or services, while most open-source frameworks focus on orchestration and evaluation rather than scalable agent training. We present Orchard, an open-source framework for scalable agentic modeling. At its core is Orchard Env, a lightweight environment service providing reusable primitives for sandbox lifecycle management across task domains, agent harnesses, and pipeline stages. On top of Orchard Env, we build three agentic modeling recipes. Orchard-SWE targets coding agents. We distill 107K trajectories from MiniMax-M2.5 and Qwen3.5-397B, introduce credit-assignment SFT to learn from productive segments of unresolved trajectories, and apply Balanced Adaptive Rollout for RL. Starting from Qwen3-30B-A3B-Thinking, Orchard-SWE achieves 64.3% on SWE-bench Verified after SFT and 67.5% after SFT+RL, setting a new state of the art among open-source models of comparable size. Orchard-GUI trains a 4B vision-language computer-use agent using only 0.4K distilled trajectories and 2.2K open-ended tasks. It achieves 74.1%, 67.0%, and 64.0% success rates on WebVoyager, Online-Mind2Web, and DeepShop, respectively, making it the strongest open-source model while remaining competitive with proprietary systems. Orchard-Claw targets personal assistant agents. Trained with only 0.2K synthetic tasks, it achieves 59.6% pass@3 on Claw-Eval and 73.9% when paired with a stronger ZeroClaw harness. Collectively, these results show that a lightweight, open, harness-agnostic environment layer enables reusable agentic data, training recipes, and evaluations across domains.