LoopCoder-v2:僅需一次循環即可實現高效的測試時計算擴展
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling
June 16, 2026
作者: Jian Yang, Shawn Guo, Wei Zhang, Tianyu Zheng, Yaxin Du, Haau-Sing Li, Jiajun Wu, Yue Song, Yan Xing, Qingsong Cai, Zelong Huang, Chuan Hao, Ran Tao, Xianglong Liu, Wayne Xin Zhao, Mingjie Tang, Weifeng Lv, Ming Zhou, Bryan Dai
cs.AI
摘要
循環變換器透過重複應用共享區塊來擴展潛在計算,但順序迴圈會隨迴圈次數增加延遲和KV快取記憶體。並行循環變換器(PLT)透過跨循環位置偏移(CLP)和共享KV門控滑動窗口注意力機制緩解了此成本,使迴圈次數成為實用的設計選擇。因此,我們從增益成本視角研究PLT的迴圈次數選擇:額外迴圈可能精煉表徵,但CLP同時在每個迴圈邊界引入位置錯配。我們透過從頭訓練LoopCoder-v2——一個具有不同迴圈次數的7B PLT編碼器家族,使用18T tokens,並進行匹配的指令微調與評估,來具體化此研究。實驗中,雙迴圈版本在程式碼生成、程式碼推理、代理軟體工程及工具使用基準上,相較無迴圈基線取得廣泛提升,將SWE-bench Verified從43.0分提升至64.4分,Multi-SWE從14.0分提升至31.0分。相比之下,三個或更多迴圈的變異版本出現退化,顯示出強烈的非單調迴圈次數效應。我們的診斷表明,迴圈2提供了主要的有益精煉,而後續迴圈產生遞減、震盪的更新以及較低的表徵多樣性。由於CLP誘導的錯配在精煉增益縮小時大致保持不變,偏移成本逐漸占主導。此增益成本權衡解釋了PLT在雙迴圈時的飽和現象,並為迴圈次數選擇提供了診斷依據。
English
Looped Transformers scale latent computation by repeatedly applying shared blocks, but sequential looping increases latency and KV-cache memory with the loop count. Parallel loop Transformers (PLT) alleviate this cost through cross-loop position offsets (CLP) and shared-KV gated sliding-window attention, making loop count a practical design choice. We therefore study PLT loop-count selection through a gain--cost view: an extra loop may refine representations, but CLP also introduces a positional mismatch at each loop boundary. We instantiate this study by training LoopCoder-v2, a family of 7B PLT coders with different loop counts, from scratch on 18T tokens, followed by matched instruction tuning and evaluation. Empirically, the two-loop variant delivers broad gains over the non-looped baseline across code generation, code reasoning, agentic software engineering, and tool-use benchmarks, improving SWE-bench Verified from 43.0 to 64.4 points and Multi-SWE from 14.0 to 31.0 points. In contrast, variants with three or more loops regress, revealing a strongly non-monotonic loop-count effect. Our diagnostics show that loop 2 provides the main productive refinement, while later loops yield diminishing, oscillatory updates and reduced representational diversity. Because the CLP-induced mismatch remains roughly fixed as refinement gains shrink, the offset cost increasingly dominates. This gain--cost trade-off explains PLT's saturation at two loops and provides diagnostics for loop-count selection.