強化式快速權重與下一序列預測（注：標題採用學術界常見的"強化式"對應"Reinforced"，"快速權重"為"Fast Weights"標準譯法，"下一序列預測"精準對應Next-Sequence Prediction的技術概念）

摘要

快速權重架構為長文本建模提供了一種極具潛力的替代方案，其記憶體開銷不隨文本長度增加而改變，突破了注意力機制變壓器的限制。然而，下一代詞預測訓練範式制約了該架構的發展潛力。NTP僅優化單一詞元預測，忽略前綴後多詞元間的語義連貫性。這使得通過動態更新參數存儲上下文信息的快速權重模型，只能學習到無法捕捉長距離依賴關係的次優表徵。我們提出REFINE強化學習框架，通過下一代序列預測目標訓練快速權重模型。該框架基於預測熵選取信息量豐富的詞元位置，生成多詞元推演軌跡，分配自監督的序列級獎勵，並採用群組相對策略優化進行模型優化。REFINE可應用於預訓練語言模型的完整訓練週期：中期訓練、後訓練及測試時訓練。在LaCT-760M和DeltaNet-1.3B上的實驗表明，REFINE在「大海撈針」檢索、長文本問答及LongBench多樣化任務中，均持續超越採用NTP的監督式微調方法。該框架為提升快速權重架構的長文本建模能力提供了高效通用的解決方案。

English

Fast weight architectures offer a promising alternative to attention-based transformers for long-context modeling by maintaining constant memory overhead regardless of context length. However, their potential is limited by the next-token prediction (NTP) training paradigm. NTP optimizes single-token predictions and ignores semantic coherence across multiple tokens following a prefix. Consequently, fast weight models, which dynamically update their parameters to store contextual information, learn suboptimal representations that fail to capture long-range dependencies. We introduce REFINE (Reinforced Fast weIghts with Next sEquence prediction), a reinforcement learning framework that trains fast weight models under the next-sequence prediction (NSP) objective. REFINE selects informative token positions based on prediction entropy, generates multi-token rollouts, assigns self-supervised sequence-level rewards, and optimizes the model with group relative policy optimization (GRPO). REFINE is applicable throughout the training lifecycle of pre-trained language models: mid-training, post-training, and test-time training. Our experiments on LaCT-760M and DeltaNet-1.3B demonstrate that REFINE consistently outperforms supervised fine-tuning with NTP across needle-in-a-haystack retrieval, long-context question answering, and diverse tasks in LongBench. REFINE provides an effective and versatile framework for improving long-context modeling in fast weight architectures.

強化式快速權重與下一序列預測（注：標題採用學術界常見的"強化式"對應"Reinforced"，"快速權重"為"Fast Weights"標準譯法，"下一序列預測"精準對應Next-Sequence Prediction的技術概念）

Reinforced Fast Weights with Next-Sequence Prediction

摘要

Support