ストリーム・オブ・サーチ（SoS）：言語における探索の学習

要旨

言語モデルは、訓練中に有益なミスを示されることがほとんどありません。その結果、次のトークンを超えて先を見通すことに苦労し、エラーの連鎖に悩まされ、数ステップ先の行動の結果を予測するのに苦戦します。本論文では、検索プロセスを言語として表現し、平坦化された文字列——検索のストリーム（SoS）——として表すことで、言語モデルに検索を教える方法を示します。私たちは、さまざまな記号的検索戦略を捉える統一された検索言語を提案します。このアプローチを、入力された数値を算術演算で組み合わせて目標数値に到達するというシンプルだが難しいゲーム「Countdown」を用いて実証します。ヒューリスティックソルバーによって生成された検索ストリームのデータセットで、Transformerベースの言語モデルをゼロから事前学習させます。その結果、SoS事前学習により、最適な検索軌道のみを予測するように訓練されたモデルに比べて、検索精度が25％向上することがわかりました。さらに、このモデルを2つのポリシー改善手法——Advantage-Induced Policy Alignment（APA）とSelf-Taught Reasoner（STaR）——でファインチューニングします。ファインチューニングされたSoSモデルは、以前は解けなかった問題の36％を解決し、ヒューリスティックソルバーでは解けない問題も含まれています。私たちの結果は、言語モデルが検索を通じて問題を解決することを学び、柔軟に異なる検索戦略を使用するために自己改善し、潜在的に新しい戦略を発見できることを示しています。

English

Language models are rarely shown fruitful mistakes while training. They then struggle to look beyond the next token, suffering from a snowballing of errors and struggling to predict the consequence of their actions several steps ahead. In this paper, we show how language models can be taught to search by representing the process of search in language, as a flattened string -- a stream of search (SoS). We propose a unified language for search that captures an array of different symbolic search strategies. We demonstrate our approach using the simple yet difficult game of Countdown, where the goal is to combine input numbers with arithmetic operations to reach a target number. We pretrain a transformer-based language model from scratch on a dataset of streams of search generated by heuristic solvers. We find that SoS pretraining increases search accuracy by 25% over models trained to predict only the optimal search trajectory. We further finetune this model with two policy improvement methods: Advantage-Induced Policy Alignment (APA) and Self-Taught Reasoner (STaR). The finetuned SoS models solve 36% of previously unsolved problems, including problems that cannot be solved by any of the heuristic solvers. Our results indicate that language models can learn to solve problems via search, self-improve to flexibly use different search strategies, and potentially discover new ones.

ストリーム・オブ・サーチ（SoS）：言語における探索の学習

Stream of Search (SoS): Learning to Search in Language

要旨

Summary

Support

Support