针对随机Minimax树的双保真度最佳行动识别

摘要

我们研究随机极小极大树中的固定置信度最优动作识别（BAI）问题。这一问题在现代人工智能规划中日益重要，深度极小极大搜索和基于语言模型长rollout的蒙特卡洛树搜索（MCTS）面临一个根本性权衡：启发式评估成本低廉但存在偏差，而精确rollout结果可靠却代价高昂。为此，我们提出双保真度树搜索算法2FFS，将多保真度平面赌博机思想引入树结构。该算法融合极小极大式快速扩展与MCTS式随机采样，自适应地决定何时利用廉价有偏评估，何时调用昂贵精确评估进行局部验证。我们证明了固定置信度下的正确性，建立了精确识别的有限停止性，并给出了通用深度树的多项式深度成本上界。在数值随机树实验中，与现有BAI-MCTS基线相比，2FFS使用的样本量和计算操作显著减少。

English

We study fixed-confidence best-action identification (BAI) in stochastic minimax trees. This problem is increasingly relevant in modern AI planning, where deep minimax search and Monte Carlo Tree Search (MCTS) with language model long rollouts face a fundamental tradeoff: heuristic evaluations are cheap but biased, while accurate rollouts are reliable but prohibitively expensive. We propose 2FFS, a two-fidelity tree-search algorithm that brings multi-fidelity flat bandit ideas into trees. The algorithm combines minimax-style fast expansion with MCTS-style stochastic sampling, adaptively deciding when to exploit cheap biased evaluations and when to invoke expensive accurate evaluations for local certification. We prove fixed-confidence correctness, establish finite stopping for exact identification, and give a polynomial-depth cost upper bound for general-depth trees. Across numerical stochastic-tree experiments, 2FFS uses substantially fewer samples and computational operations comparing to existing BAI-MCTS baseline.