LLM的高效探索
Efficient Exploration for LLMs
February 1, 2024
作者: Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao, Benjamin Van Roy
cs.AI
摘要
我们提供了有效探索在收集人类反馈以改进大型语言模型方面的重要好处的证据。在我们的实验中,一个代理顺序生成查询,同时将奖励模型拟合到收到的反馈中。我们表现最佳的代理使用双 Thompson 采样生成查询,不确定性由认知神经网络表示。我们的结果表明,有效探索使性能水平高,查询数量大大减少。此外,不确定性估计和探索方案的选择起着至关重要的作用。
English
We present evidence of substantial benefit from efficient exploration in
gathering human feedback to improve large language models. In our experiments,
an agent sequentially generates queries while fitting a reward model to the
feedback received. Our best-performing agent generates queries using double
Thompson sampling, with uncertainty represented by an epistemic neural network.
Our results demonstrate that efficient exploration enables high levels of
performance with far fewer queries. Further, both uncertainty estimation and
the choice of exploration scheme play critical roles.