확률적 미니맥스 트리를 위한 이중 충실도 최적 행동 식별

초록

우리는 확률적 미니맥스 트리에서의 고정 신뢰도 최적 행동 식별(BAI) 문제를 연구한다. 이 문제는 현대 AI 계획 수립에서 점점 더 중요해지고 있으며, 심층 미니맥스 탐색과 언어 모델을 활용한 긴 롤아웃을 포함한 몬테카를로 트리 탐색(MCTS)은 근본적인 트레이드오프에 직면해 있다: 휴리스틱 평가는 저렴하지만 편향된 반면, 정확한 롤아웃은 신뢰할 수 있지만 비용이 엄청나게 많이 든다. 우리는 다중 충실도 플랫 밴딧 아이디어를 트리로 확장하는 이중 충실도 트리 탐색 알고리즘인 2FFS를 제안한다. 이 알고리즘은 미니맥스 스타일의 빠른 확장과 MCTS 스타일의 확률적 샘플링을 결합하여, 저렴한 편향 평가를 활용할 시점과 국소적 인증을 위해 비용이 많이 드는 정확한 평가를 호출할 시점을 적응적으로 결정한다. 우리는 고정 신뢰도 정확성을 증명하고, 정확한 식별을 위한 유한 정지 조건을 확립하며, 일반 깊이 트리에 대한 다항식 깊이 비용 상한을 제시한다. 다양한 수치적 확률적 트리 실험에서 2FFS는 기존 BAI-MCTS 기준선에 비해 현저히 적은 샘플과 계산 연산을 사용한다.

English

We study fixed-confidence best-action identification (BAI) in stochastic minimax trees. This problem is increasingly relevant in modern AI planning, where deep minimax search and Monte Carlo Tree Search (MCTS) with language model long rollouts face a fundamental tradeoff: heuristic evaluations are cheap but biased, while accurate rollouts are reliable but prohibitively expensive. We propose 2FFS, a two-fidelity tree-search algorithm that brings multi-fidelity flat bandit ideas into trees. The algorithm combines minimax-style fast expansion with MCTS-style stochastic sampling, adaptively deciding when to exploit cheap biased evaluations and when to invoke expensive accurate evaluations for local certification. We prove fixed-confidence correctness, establish finite stopping for exact identification, and give a polynomial-depth cost upper bound for general-depth trees. Across numerical stochastic-tree experiments, 2FFS uses substantially fewer samples and computational operations comparing to existing BAI-MCTS baseline.