PaCoRe:透過平行協調推理學習擴展測試時計算規模
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning
January 9, 2026
作者: Jingcheng Hu, Yinmin Zhang, Shijie Shang, Xiaobo Yang, Yue Peng, Zhewei Huang, Hebin Zhou, Xin Wu, Jie Cheng, Fanqi Wan, Xiangwen Kong, Chengyuan Yao, Kaiwen Yan, Ailin Huang, Hongyu Zhou, Qi Han, Zheng Ge, Daxin Jiang, Xiangyu Zhang, Heung-Yeung Shum
cs.AI
摘要
我們提出平行協同推理(PaCoRe),這是一個專為突破當代語言模型核心限制而設計的訓練與推論框架:現有模型無法在固定上下文窗口下,將測試時計算量(TTC)擴展至遠超序列推理的規模。PaCoRe 突破傳統序列範式,透過多輪訊息傳遞架構協調大規模平行探索來驅動 TTC。每輪啟動多條平行推理軌跡,將其發現壓縮為上下文受限的訊息,並整合這些訊息以指導下一輪推理,最終生成答案。透過大規模基於結果的強化學習進行端到端訓練,模型掌握了 PaCoRe 所需的整合能力,能將有效 TTC 擴展至數百萬詞元規模而不突破上下文限制。該方法在多領域實現顯著提升,尤其在數學推理上超越前沿系統:一個 80 億參數模型在 HMMT 2025 競賽中達到 94.5% 準確率,透過將有效 TTC 擴展至約兩百萬詞元,超越了 GPT-5 的 93.2%。我們開源了模型檢查點、訓練資料與完整推論流程,以加速後續研究。
English
We introduce Parallel Coordinated Reasoning (PaCoRe), a training-and-inference framework designed to overcome a central limitation of contemporary language models: their inability to scale test-time compute (TTC) far beyond sequential reasoning under a fixed context window. PaCoRe departs from the traditional sequential paradigm by driving TTC through massive parallel exploration coordinated via a message-passing architecture in multiple rounds. Each round launches many parallel reasoning trajectories, compacts their findings into context-bounded messages, and synthesizes these messages to guide the next round and ultimately produce the final answer. Trained end-to-end with large-scale, outcome-based reinforcement learning, the model masters the synthesis abilities required by PaCoRe and scales to multi-million-token effective TTC without exceeding context limits. The approach yields strong improvements across diverse domains, and notably pushes reasoning beyond frontier systems in mathematics: an 8B model reaches 94.5% on HMMT 2025, surpassing GPT-5's 93.2% by scaling effective TTC to roughly two million tokens. We open-source model checkpoints, training data, and the full inference pipeline to accelerate follow-up work.