ChatPaper.aiChatPaper

**随心而动:摇滚乐上的能动塑造,在开放能动学习生态中构建ROME模型** 本文探讨了在开放能动学习框架内构建ROME(可重构操作与模型演化)模型的方法论,重点研究如何将摇滚乐作为测试场景来实现智能体的创造性塑造。通过设计具有自主决策能力的智能体系统,我们展示了如何在这种动态文化载体上实现技术架构与艺术表达的深度融合。研究证明,开放生态下的智能体不仅能有效解构摇滚乐的创作规律,更能通过持续交互形成具有演化能力的创作范式,为AI驱动的艺术创新提供了新的技术路径。

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

December 31, 2025
作者: Weixun Wang, XiaoXiao Xu, Wanhe An, Fangwen Dai, Wei Gao, Yancheng He, Ju Huang, Qiang Ji, Hanqi Jin, Xiaoyang Li, Yang Li, Zhongwen Li, Shirong Lin, Jiashun Liu, Zenan Liu, Tao Luo, Dilxat Muhtar, Yuanbin Qu, Jiaqiang Shi, Qinghui Sun, Yingshui Tan, Hao Tang, Runze Wang, Yi Wang, Zhaoguo Wang, Yanan Wu, Shaopan Xiong, Binchen Xu, Xander Xu, Yuchi Xu, Qipeng Zhang, Xixia Zhang, Haizhou Zhao, Jie Zhao, Shuaibing Zhao, Baihui Zheng, Jianhui Zheng, Suhang Zheng, Yanni Zhu, Mengze Cai, Kerui Cao, Xitong Chen, Yue Dai, Lifan Du, Tao Feng, Tao He, Jin Hu, Yijie Hu, Ziyu Jiang, Cheng Li, Xiang Li, Jing Liang, Chonghuan Liu, ZhenDong Liu, Haodong Mi, Yanhu Mo, Junjia Ni, Shixin Pei, Jingyu Shen, XiaoShuai Song, Cecilia Wang, Chaofan Wang, Kangyu Wang, Pei Wang, Tao Wang, Wei Wang, Ke Xiao, Mingyu Xu, Tiange Xu, Nan Ya, Siran Yang, Jianan Ye, Yaxing Zang, Duo Zhang, Junbo Zhang, Boren Zheng, Wanxi Deng, Ling Pan, Lin Qu, Wenbo Su, Jiamang Wang, Wei Wang, Hu Wei, Minggang Wu, Cheng Yu, Bing Zhao, Zhicheng Zheng, Bo Zheng
cs.AI

摘要

智能体构建要求大语言模型在现实环境中通过多轮操作执行动作、观察结果并迭代优化产物。尽管其重要性日益凸显,开源社区仍缺乏一套规范化的端到端生态系统来简化智能体开发。我们推出智能体学习生态系统(ALE),这一基础架构可优化智能体大语言模型的生产流程。ALE包含三大核心组件:权重优化后训练框架ROLL、用于轨迹生成的沙箱环境管理器ROCK,以及高效上下文工程智能体框架iFlow CLI。我们同步开源基于ALE构建的智能体模型ROME(ROME显然是智能体模型),该模型基于超百万条轨迹训练而成。我们的方法包含合成复杂行为的数据组合协议,以及创新性策略优化算法——基于交互的策略对齐(IPA),该算法通过语义交互块而非单个令牌进行信用分配,从而提升长周期训练的稳定性。实证研究中,我们在结构化场景下评估ROME,并推出具有更优规模与污染控制能力的基准测试Terminal Bench Pro。ROME在SWE-bench Verified和Terminal Bench等基准测试中表现优异,证明了ALE基础设施的有效性。
English
Agentic crafting requires LLMs to operate in real-world environments over multiple turns by taking actions, observing outcomes, and iteratively refining artifacts. Despite its importance, the open-source community lacks a principled, end-to-end ecosystem to streamline agent development. We introduce the Agentic Learning Ecosystem (ALE), a foundational infrastructure that optimizes the production pipeline for agent LLMs. ALE consists of three components: ROLL, a post-training framework for weight optimization; ROCK, a sandbox environment manager for trajectory generation; and iFlow CLI, an agent framework for efficient context engineering. We release ROME (ROME is Obviously an Agentic Model), an open-source agent grounded by ALE and trained on over one million trajectories. Our approach includes data composition protocols for synthesizing complex behaviors and a novel policy optimization algorithm, Interaction-based Policy Alignment (IPA), which assigns credit over semantic interaction chunks rather than individual tokens to improve long-horizon training stability. Empirically, we evaluate ROME within a structured setting and introduce Terminal Bench Pro, a benchmark with improved scale and contamination control. ROME demonstrates strong performance across benchmarks like SWE-bench Verified and Terminal Bench, proving the effectiveness of the ALE infrastructure.
PDF331January 2, 2026