ChatPaper.aiChatPaper

LoopCoder-v2:仅循环一次,实现高效的测试时计算扩展

LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling

June 16, 2026
作者: Jian Yang, Shawn Guo, Wei Zhang, Tianyu Zheng, Yaxin Du, Haau-Sing Li, Jiajun Wu, Yue Song, Yan Xing, Qingsong Cai, Zelong Huang, Chuan Hao, Ran Tao, Xianglong Liu, Wayne Xin Zhao, Mingjie Tang, Weifeng Lv, Ming Zhou, Bryan Dai
cs.AI

摘要

循环Transformer通过重复应用共享模块来扩展潜在计算,但顺序循环会随着循环次数的增加而增加延迟和KV缓存内存。并行循环Transformer(PLT)通过跨循环位置偏移(CLP)和共享KV门控滑动窗口注意力来缓解这一代价,使循环次数成为实用的设计选择。因此,我们从收益-成本视角研究PLT的循环次数选择:额外循环可能优化表示,但CLP在每个循环边界处也会引入位置不匹配。我们通过从头训练LoopCoder-v2来落实这项研究——这是一族具有不同循环次数的7B参数PLT代码模型,在18T token上预训练,再经过匹配的指令微调和评估。实验表明,两循环变体在代码生成、代码推理、智能体软件工程和工具使用基准测试中普遍优于无循环基线,将SWE-bench Verified从43.0提升至64.4分,Multi-SWE从14.0提升至31.0分。相比之下,三次或更多循环的变体出现性能下降,揭示了强烈的非单调循环次数效应。我们的诊断表明,第二次循环提供了主要的生产性优化,而后续循环带来衰减且振荡的更新,表示多样性降低。由于CLP引起的位置不匹配在优化增益缩小时大致保持固定,偏移成本逐渐占据主导。这种收益-成本权衡解释了PLT在两次循环处达到饱和的原因,并为循环次数选择提供了诊断依据。
English
Looped Transformers scale latent computation by repeatedly applying shared blocks, but sequential looping increases latency and KV-cache memory with the loop count. Parallel loop Transformers (PLT) alleviate this cost through cross-loop position offsets (CLP) and shared-KV gated sliding-window attention, making loop count a practical design choice. We therefore study PLT loop-count selection through a gain--cost view: an extra loop may refine representations, but CLP also introduces a positional mismatch at each loop boundary. We instantiate this study by training LoopCoder-v2, a family of 7B PLT coders with different loop counts, from scratch on 18T tokens, followed by matched instruction tuning and evaluation. Empirically, the two-loop variant delivers broad gains over the non-looped baseline across code generation, code reasoning, agentic software engineering, and tool-use benchmarks, improving SWE-bench Verified from 43.0 to 64.4 points and Multi-SWE from 14.0 to 31.0 points. In contrast, variants with three or more loops regress, revealing a strongly non-monotonic loop-count effect. Our diagnostics show that loop 2 provides the main productive refinement, while later loops yield diminishing, oscillatory updates and reduced representational diversity. Because the CLP-induced mismatch remains roughly fixed as refinement gains shrink, the offset cost increasingly dominates. This gain--cost trade-off explains PLT's saturation at two loops and provides diagnostics for loop-count selection.