LLM的高效探索

摘要

我们提供了有效探索在收集人类反馈以改进大型语言模型方面的重要好处的证据。在我们的实验中，一个代理顺序生成查询，同时将奖励模型拟合到收到的反馈中。我们表现最佳的代理使用双 Thompson 采样生成查询，不确定性由认知神经网络表示。我们的结果表明，有效探索使性能水平高，查询数量大大减少。此外，不确定性估计和探索方案的选择起着至关重要的作用。

English

We present evidence of substantial benefit from efficient exploration in gathering human feedback to improve large language models. In our experiments, an agent sequentially generates queries while fitting a reward model to the feedback received. Our best-performing agent generates queries using double Thompson sampling, with uncertainty represented by an epistemic neural network. Our results demonstrate that efficient exploration enables high levels of performance with far fewer queries. Further, both uncertainty estimation and the choice of exploration scheme play critical roles.

LLM的高效探索

Efficient Exploration for LLMs

摘要

Support