PETS:一种面向高效测试时自洽性最优轨迹分配的 principled 框架
PETS: A Principled Framework Towards Optimal Trajectory Allocation for Efficient Test-Time Self-Consistency
February 18, 2026
作者: Zhangyi Liu, Huaizhi Qu, Xiaowei Yin, He Sun, Yanjun Han, Tianlong Chen, Zhun Deng
cs.AI
摘要
测试时缩放技术通过聚合随机推理轨迹能够提升模型性能,然而在有限预算下实现样本高效的测试时自一致性仍是一个开放难题。我们提出PETS(原则化高效测试时自一致性)方法,通过优化框架对轨迹分配展开原则性研究。该方法的核心理念是自一致率——一种定义为与无限预算多数投票结果一致性的新度量标准。该公式使样本高效的测试时分配具有理论依据并适用于严谨分析。我们研究了离线和在线两种场景:在预先获知所有问题的离线场景中,通过将推理轨迹建模为工作者,我们将轨迹分配问题与经典且发展成熟的众包领域相联系,这一视角使我们能利用丰富的现有理论,获得理论保证并开发出基于多数投票的高效分配算法;在问题顺序到达、需实时分配资源的在线流式场景中,我们受离线框架启发提出新方法,该方法能根据问题难度自适应调整预算,同时保持强理论保证和计算效率。实验表明PETS持续优于均匀分配策略,在GPQA数据集上,PETS在两种场景下均实现完美自一致性,同时相比均匀分配将采样预算降低达75%(离线)和55%(在线)。代码详见https://github.com/ZDCSlab/PETS。
English
Test-time scaling can improve model performance by aggregating stochastic reasoning trajectories. However, achieving sample-efficient test-time self-consistency under a limited budget remains an open challenge. We introduce PETS (Principled and Efficient Test-TimeSelf-Consistency), which initiates a principled study of trajectory allocation through an optimization framework. Central to our approach is the self-consistency rate, a new measure defined as agreement with the infinite-budget majority vote. This formulation makes sample-efficient test-time allocation theoretically grounded and amenable to rigorous analysis. We study both offline and online settings. In the offline regime, where all questions are known in advance, we connect trajectory allocation to crowdsourcing, a classic and well-developed area, by modeling reasoning traces as workers. This perspective allows us to leverage rich existing theory, yielding theoretical guarantees and an efficient majority-voting-based allocation algorithm. In the online streaming regime, where questions arrive sequentially and allocations must be made on the fly, we propose a novel method inspired by the offline framework. Our approach adapts budgets to question difficulty while preserving strong theoretical guarantees and computational efficiency. Experiments show that PETS consistently outperforms uniform allocation. On GPQA, PETS achieves perfect self-consistency in both settings while reducing the sampling budget by up to 75% (offline) and 55% (online) relative to uniform allocation. Code is available at https://github.com/ZDCSlab/PETS.