ChatPaper.aiChatPaper

讓它流動:搖滾樂上的能動性塑造,在開放能動學習生態系中建構ROME模型

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

December 31, 2025
作者: Weixun Wang, XiaoXiao Xu, Wanhe An, Fangwen Dai, Wei Gao, Yancheng He, Ju Huang, Qiang Ji, Hanqi Jin, Xiaoyang Li, Yang Li, Zhongwen Li, Shirong Lin, Jiashun Liu, Zenan Liu, Tao Luo, Dilxat Muhtar, Yuanbin Qu, Jiaqiang Shi, Qinghui Sun, Yingshui Tan, Hao Tang, Runze Wang, Yi Wang, Zhaoguo Wang, Yanan Wu, Shaopan Xiong, Binchen Xu, Xander Xu, Yuchi Xu, Qipeng Zhang, Xixia Zhang, Haizhou Zhao, Jie Zhao, Shuaibing Zhao, Baihui Zheng, Jianhui Zheng, Suhang Zheng, Yanni Zhu, Mengze Cai, Kerui Cao, Xitong Chen, Yue Dai, Lifan Du, Tao Feng, Tao He, Jin Hu, Yijie Hu, Ziyu Jiang, Cheng Li, Xiang Li, Jing Liang, Chonghuan Liu, ZhenDong Liu, Haodong Mi, Yanhu Mo, Junjia Ni, Shixin Pei, Jingyu Shen, XiaoShuai Song, Cecilia Wang, Chaofan Wang, Kangyu Wang, Pei Wang, Tao Wang, Wei Wang, Ke Xiao, Mingyu Xu, Tiange Xu, Nan Ya, Siran Yang, Jianan Ye, Yaxing Zang, Duo Zhang, Junbo Zhang, Boren Zheng, Wanxi Deng, Ling Pan, Lin Qu, Wenbo Su, Jiamang Wang, Wei Wang, Hu Wei, Minggang Wu, Cheng Yu, Bing Zhao, Zhicheng Zheng, Bo Zheng
cs.AI

摘要

智慧體構建要求大型語言模型在現實環境中透過多輪操作執行行動、觀察結果並迭代優化產出物。儘管其重要性日益凸顯,開源社群仍缺乏一套系統化的端到端生態系統來簡化智慧體開發流程。我們推出智慧體學習生態系統(ALE),這是一套優化智慧體LLM生產管線的基礎架構。ALE包含三大核心組件:用於權重優化的後訓練框架ROLL、用於軌跡生成的沙箱環境管理器ROCK,以及實現高效上下文工程的智慧體框架iFlow CLI。我們同步開源基於ALE架構構建的智慧體模型ROME(顯然是智慧體模型),該模型經過超過百萬條軌跡數據訓練。我們的方法包含合成複雜行為的數據組合協議,以及創新的策略優化算法「基於互動的策略對齊」(IPA),該算法透過語義互動塊而非單個詞元進行信號分配,從而提升長週期訓練穩定性。實證研究中,我們在結構化環境下評估ROME,並推出具有更優規模與污染控制能力的終端基準測試套件Terminal Bench Pro。ROME在SWE-bench Verified和Terminal Bench等基準測試中表現卓越,證明了ALE基礎架構的有效性。
English
Agentic crafting requires LLMs to operate in real-world environments over multiple turns by taking actions, observing outcomes, and iteratively refining artifacts. Despite its importance, the open-source community lacks a principled, end-to-end ecosystem to streamline agent development. We introduce the Agentic Learning Ecosystem (ALE), a foundational infrastructure that optimizes the production pipeline for agent LLMs. ALE consists of three components: ROLL, a post-training framework for weight optimization; ROCK, a sandbox environment manager for trajectory generation; and iFlow CLI, an agent framework for efficient context engineering. We release ROME (ROME is Obviously an Agentic Model), an open-source agent grounded by ALE and trained on over one million trajectories. Our approach includes data composition protocols for synthesizing complex behaviors and a novel policy optimization algorithm, Interaction-based Policy Alignment (IPA), which assigns credit over semantic interaction chunks rather than individual tokens to improve long-horizon training stability. Empirically, we evaluate ROME within a structured setting and introduce Terminal Bench Pro, a benchmark with improved scale and contamination control. ROME demonstrates strong performance across benchmarks like SWE-bench Verified and Terminal Bench, proving the effectiveness of the ALE infrastructure.
PDF331January 2, 2026