LoopCoder-v2：仅循环一次，实现高效的测试时计算扩展

摘要

循环Transformer通过重复应用共享模块来扩展潜在计算，但顺序循环会随着循环次数的增加而增加延迟和KV缓存内存。并行循环Transformer（PLT）通过跨循环位置偏移（CLP）和共享KV门控滑动窗口注意力来缓解这一代价，使循环次数成为实用的设计选择。因此，我们从收益-成本视角研究PLT的循环次数选择：额外循环可能优化表示，但CLP在每个循环边界处也会引入位置不匹配。我们通过从头训练LoopCoder-v2来落实这项研究——这是一族具有不同循环次数的7B参数PLT代码模型，在18T token上预训练，再经过匹配的指令微调和评估。实验表明，两循环变体在代码生成、代码推理、智能体软件工程和工具使用基准测试中普遍优于无循环基线，将SWE-bench Verified从43.0提升至64.4分，Multi-SWE从14.0提升至31.0分。相比之下，三次或更多循环的变体出现性能下降，揭示了强烈的非单调循环次数效应。我们的诊断表明，第二次循环提供了主要的生产性优化，而后续循环带来衰减且振荡的更新，表示多样性降低。由于CLP引起的位置不匹配在优化增益缩小时大致保持固定，偏移成本逐渐占据主导。这种收益-成本权衡解释了PLT在两次循环处达到饱和的原因，并为循环次数选择提供了诊断依据。

English

Looped Transformers scale latent computation by repeatedly applying shared blocks, but sequential looping increases latency and KV-cache memory with the loop count. Parallel loop Transformers (PLT) alleviate this cost through cross-loop position offsets (CLP) and shared-KV gated sliding-window attention, making loop count a practical design choice. We therefore study PLT loop-count selection through a gain--cost view: an extra loop may refine representations, but CLP also introduces a positional mismatch at each loop boundary. We instantiate this study by training LoopCoder-v2, a family of 7B PLT coders with different loop counts, from scratch on 18T tokens, followed by matched instruction tuning and evaluation. Empirically, the two-loop variant delivers broad gains over the non-looped baseline across code generation, code reasoning, agentic software engineering, and tool-use benchmarks, improving SWE-bench Verified from 43.0 to 64.4 points and Multi-SWE from 14.0 to 31.0 points. In contrast, variants with three or more loops regress, revealing a strongly non-monotonic loop-count effect. Our diagnostics show that loop 2 provides the main productive refinement, while later loops yield diminishing, oscillatory updates and reduced representational diversity. Because the CLP-induced mismatch remains roughly fixed as refinement gains shrink, the offset cost increasingly dominates. This gain--cost trade-off explains PLT's saturation at two loops and provides diagnostics for loop-count selection.