ChatPaper.aiChatPaper

并行循环Transformer:高效测试时计算扩展

Parallel Loop Transformer for Efficient Test-Time Computation Scaling

October 28, 2025
作者: Bohong Wu, Mengzhao Chen, Xiang Luo, Shen Yan, Qifan Yu, Fan Xia, Tianqi Zhang, Hongrui Zhan, Zheng Zhong, Xun Zhou, Siyuan Qiao, Xingyan Bin
cs.AI

摘要

大型語言模型(LLMs)雖然能力強大,但在實際推理應用中往往存在速度過慢、成本過高的問題。循環變壓器通過在多個計算步驟(即「循環」)中重複使用相同權重來節省參數,但這種方法存在一個重大缺陷:循環必須依次執行,導致每增加一次循環,推理延遲和記憶體需求都會相應增長,因而難以應用於需要快速響應的場景。為解決這一問題,我們提出並行循環變壓器(PLT)。這種新型架構既能保持深度循環模型的高性能,又能實現標準非循環模型的低延遲特性。PLT的核心運作依賴兩項關鍵技術:首先,跨循環並行(CLP)通過在同一前向傳播過程中為不同詞元同步計算不同循環,打破了順序依賴性;其次,為防止記憶體開銷增長,我們採用高效表徵增強策略——將首輪循環的記憶體(KV緩存)共享給所有後續循環,並通過門控滑窗注意力(G-SWA)將共享的全局信息與局部信息融合,從而保持高精度。實驗表明,PLT在達到傳統循環模型精確度的同時,其延遲和記憶體開銷與標準變壓器相比幾乎沒有增加。
English
Large Language Models (LLMs) are powerful but often too slow and costly for real-world use during inference. Looped transformers save on parameters by reusing the same weights for multiple computational steps, or "loops." However, this approach has a major flaw: the loops run one after another, causing inference latency and memory requirements to increase with each added loop. This makes them impractical for fast applications. To solve this problem, we introduce the Parallel Loop Transformer (PLT). PLT is a new architecture that delivers the performance benefits of a deep, looped model but with the low latency of a standard, non-looped model. PLT works using two key techniques. First, Cross-Loop Parallelism (CLP) breaks the sequential dependency by computing different loops for different tokens at the same time, all within a single pass. Second, to prevent memory costs from growing, we use an Efficient Representation Enhancement strategy. This method shares the memory (KV cache) from the first loop with all other loops. It then uses a Gated Sliding-Window Attention (G-SWA) to combine this shared global information with local information, maintaining high accuracy. Our experiments show that PLT achieves the high accuracy of a traditional looped model but with almost no extra latency or memory cost compared to a standard transformer.
PDF154December 2, 2025