多共享，少搜索：协作并行思考用于高效测试时扩展

摘要

测试时扩展（TTS）通过分配额外的推理计算来探索解空间，从而增强大型语言模型的推理能力。然而，现有的并行TTS方法通常在搜索过程中保持分支孤立：中间发现结果仍为分支私有，无法及时指导其他分支。这种信息隔离导致大量冗余探索，因为各分支重复发现已在别处获得的信息，并且需要更多搜索步骤来收集做出正确回答所需的完整决策信息。为解决这一问题，我们提出协作并行思考（CPT），这是一种无需训练的推理框架，能够在并行分支间实现搜索时的信息共享。CPT从正在运行的分支中提取紧凑的中间信息，维护一个去重的查询级信息池，并通过输入上下文广播池中条目，使得后续搜索步骤中的每个分支能够复用其他分支的发现，而非重新发现相同信息。在HMMT和AIME基准测试上的实验表明，CPT在多种采样预算和模型规模下，相比强基线建立了更强的准确率-延迟帕累托前沿，凸显了搜索时协作作为高效并行TTS的有效方向。

English

Test-Time Scaling (TTS) enhances the reasoning capabilities of large language models by allocating additional inference compute to explore the solution space. However, existing parallel TTS methods typically keep branches isolated during search: intermediate discoveries remain branch-private and cannot guide other branches in time. This information isolation causes substantial redundant exploration, as branches repeatedly rediscover information already found elsewhere and require more search steps to collect complete decision information needed to reach correct answers. To bridge this gap, we propose Collaborative Parallel Thinking (CPT), a training-free inference framework that enables search-time information sharing across parallel branches. CPT extracts compact intermediate information from ongoing branches, maintains a deduplicated query-level information pool, and broadcasts pool entries through the input context, allowing each branch in subsequent search steps to reuse discoveries made by other branches rather than rediscover the same information. Empirically, experiments on HMMT and AIME benchmarks show that CPT establishes a stronger accuracy--latency Pareto frontier than strong baselines across rollout budgets and model scales, highlighting search-time collaboration as an effective direction for efficient parallel TTS.