大規模言語モデルの効率的な探索

要旨

大規模言語モデルを改善するための人間のフィードバックを収集する際に、効率的な探索が大きな利益をもたらす証拠を提示します。私たちの実験では、エージェントが順次クエリを生成しながら、受け取ったフィードバックに基づいて報酬モデルを適合させます。最も性能の高いエージェントは、認識的不確実性を表す認識的ニューラルネットワークを用いたダブル・トンプソンサンプリングによってクエリを生成します。結果は、効率的な探索がはるかに少ないクエリ数で高い性能を実現することを示しています。さらに、不確実性の推定と探索スキームの選択の両方が重要な役割を果たしています。

English

We present evidence of substantial benefit from efficient exploration in gathering human feedback to improve large language models. In our experiments, an agent sequentially generates queries while fitting a reward model to the feedback received. Our best-performing agent generates queries using double Thompson sampling, with uncertainty represented by an epistemic neural network. Our results demonstrate that efficient exploration enables high levels of performance with far fewer queries. Further, both uncertainty estimation and the choice of exploration scheme play critical roles.

大規模言語モデルの効率的な探索

Efficient Exploration for LLMs

要旨

Support