LoopCoder-v2：僅需一次循環即可實現高效的測試時計算擴展

摘要

循環變換器透過重複應用共享區塊來擴展潛在計算，但順序迴圈會隨迴圈次數增加延遲和KV快取記憶體。並行循環變換器（PLT）透過跨循環位置偏移（CLP）和共享KV門控滑動窗口注意力機制緩解了此成本，使迴圈次數成為實用的設計選擇。因此，我們從增益成本視角研究PLT的迴圈次數選擇：額外迴圈可能精煉表徵，但CLP同時在每個迴圈邊界引入位置錯配。我們透過從頭訓練LoopCoder-v2——一個具有不同迴圈次數的7B PLT編碼器家族，使用18T tokens，並進行匹配的指令微調與評估，來具體化此研究。實驗中，雙迴圈版本在程式碼生成、程式碼推理、代理軟體工程及工具使用基準上，相較無迴圈基線取得廣泛提升，將SWE-bench Verified從43.0分提升至64.4分，Multi-SWE從14.0分提升至31.0分。相比之下，三個或更多迴圈的變異版本出現退化，顯示出強烈的非單調迴圈次數效應。我們的診斷表明，迴圈2提供了主要的有益精煉，而後續迴圈產生遞減、震盪的更新以及較低的表徵多樣性。由於CLP誘導的錯配在精煉增益縮小時大致保持不變，偏移成本逐漸占主導。此增益成本權衡解釋了PLT在雙迴圈時的飽和現象，並為迴圈次數選擇提供了診斷依據。

English

Looped Transformers scale latent computation by repeatedly applying shared blocks, but sequential looping increases latency and KV-cache memory with the loop count. Parallel loop Transformers (PLT) alleviate this cost through cross-loop position offsets (CLP) and shared-KV gated sliding-window attention, making loop count a practical design choice. We therefore study PLT loop-count selection through a gain--cost view: an extra loop may refine representations, but CLP also introduces a positional mismatch at each loop boundary. We instantiate this study by training LoopCoder-v2, a family of 7B PLT coders with different loop counts, from scratch on 18T tokens, followed by matched instruction tuning and evaluation. Empirically, the two-loop variant delivers broad gains over the non-looped baseline across code generation, code reasoning, agentic software engineering, and tool-use benchmarks, improving SWE-bench Verified from 43.0 to 64.4 points and Multi-SWE from 14.0 to 31.0 points. In contrast, variants with three or more loops regress, revealing a strongly non-monotonic loop-count effect. Our diagnostics show that loop 2 provides the main productive refinement, while later loops yield diminishing, oscillatory updates and reduced representational diversity. Because the CLP-induced mismatch remains roughly fixed as refinement gains shrink, the offset cost increasingly dominates. This gain--cost trade-off explains PLT's saturation at two loops and provides diagnostics for loop-count selection.