LLM的有效探索

摘要

我們提出了有效探索在收集人類反饋以改進大型語言模型方面帶來顯著好處的證據。在我們的實驗中，一個代理程序在擬合獲得的反饋時，依次生成查詢。我們表現最佳的代理程序使用雙 Thompson 取樣來生成查詢，不確定性由一個認知神經網絡表示。我們的結果表明，有效探索使性能水平達到了更高水準，並且所需的查詢數量大大減少。此外，不確定性估計和探索方案的選擇都發揮了至關重要的作用。

English

We present evidence of substantial benefit from efficient exploration in gathering human feedback to improve large language models. In our experiments, an agent sequentially generates queries while fitting a reward model to the feedback received. Our best-performing agent generates queries using double Thompson sampling, with uncertainty represented by an epistemic neural network. Our results demonstrate that efficient exploration enables high levels of performance with far fewer queries. Further, both uncertainty estimation and the choice of exploration scheme play critical roles.

LLM的有效探索

Efficient Exploration for LLMs

摘要

Support