確率的ミニマックス木における二重忠実度最適行動同定

要旨

本研究では、確率的ミニマックス木における固定信頼度の最適動作識別（BAI）を扱う。この問題は、深層ミニマックス探索や、言語モデルによる長期ロールアウトを用いたモンテカルロ木探索（MCTS）が基本的なトレードオフに直面する、現代のAIプランニングにおいて重要性を増している。すなわち、ヒューリスティック評価は安価だがバイアスがかかりやすく、正確なロールアウトは信頼性が高いものの、コストが法外に高くなるという問題である。本研究では、マルチフィデリティ・フラットバンディットの概念を木探索に導入した、2FFS（二重忠実度木探索アルゴリズム）を提案する。本アルゴリズムは、ミニマックス方式の高速展開とMCTS方式の確率的サンプリングを組み合わせ、安価でバイアスのかかった評価をいつ活用し、局所的な検証のために高価で正確な評価をいつ呼び出すかを適応的に決定する。我々は、このアルゴリズムに対して固定信頼度の正当性を証明し、正確な識別のための有限停止性を確立し、一般の深さの木に対する多項式深さコストの上限を与える。数値的な確率木実験において、2FFSは既存のBAI-MCTSベースラインと比較して、サンプル数と計算操作数を大幅に削減する。

English

We study fixed-confidence best-action identification (BAI) in stochastic minimax trees. This problem is increasingly relevant in modern AI planning, where deep minimax search and Monte Carlo Tree Search (MCTS) with language model long rollouts face a fundamental tradeoff: heuristic evaluations are cheap but biased, while accurate rollouts are reliable but prohibitively expensive. We propose 2FFS, a two-fidelity tree-search algorithm that brings multi-fidelity flat bandit ideas into trees. The algorithm combines minimax-style fast expansion with MCTS-style stochastic sampling, adaptively deciding when to exploit cheap biased evaluations and when to invoke expensive accurate evaluations for local certification. We prove fixed-confidence correctness, establish finite stopping for exact identification, and give a polynomial-depth cost upper bound for general-depth trees. Across numerical stochastic-tree experiments, 2FFS uses substantially fewer samples and computational operations comparing to existing BAI-MCTS baseline.