深度貝葉斯主動學習用於大型語言模型中的偏好建模

摘要

利用人類偏好來引導大型語言模型（LLMs）的行為在近年來已經取得顯著成功。然而，數據選擇和標記對這些系統仍然是一個瓶頸，特別是在大規模情況下。因此，選擇最具信息量的點以獲取人類反饋可能會大幅降低偏好標記的成本，並促進LLMs的進一步發展。貝葉斯主動學習提供了一個合理的框架來應對這一挑戰，在不同場景中展現出卓越的成功。然而，先前試圖將其應用於偏好建模的嘗試並未達到預期效果。在這項工作中，我們確定了天真的認知不確定性估計導致獲取冗餘樣本。我們通過提出貝葉斯主動學習者用於偏好建模（BAL-PM），一種新穎的隨機獲取策略，不僅針對偏好模型中的高認知不確定性點，還試圖最大化在由所使用的LLM跨越的特徵空間中獲取提示分佈的熵。值得注意的是，我們的實驗表明，在兩個流行的人類偏好數據集中，BAL-PM需要比以前的隨機貝葉斯獲取策略少 33% 到 68% 的偏好標記。

English

Leveraging human preferences for steering the behavior of Large Language Models (LLMs) has demonstrated notable success in recent years. Nonetheless, data selection and labeling are still a bottleneck for these systems, particularly at large scale. Hence, selecting the most informative points for acquiring human feedback may considerably reduce the cost of preference labeling and unleash the further development of LLMs. Bayesian Active Learning provides a principled framework for addressing this challenge and has demonstrated remarkable success in diverse settings. However, previous attempts to employ it for Preference Modeling did not meet such expectations. In this work, we identify that naive epistemic uncertainty estimation leads to the acquisition of redundant samples. We address this by proposing the Bayesian Active Learner for Preference Modeling (BAL-PM), a novel stochastic acquisition policy that not only targets points of high epistemic uncertainty according to the preference model but also seeks to maximize the entropy of the acquired prompt distribution in the feature space spanned by the employed LLM. Notably, our experiments demonstrate that BAL-PM requires 33% to 68% fewer preference labels in two popular human preference datasets and exceeds previous stochastic Bayesian acquisition policies.

深度貝葉斯主動學習用於大型語言模型中的偏好建模

Deep Bayesian Active Learning for Preference Modeling in Large Language Models

摘要

Support