朝向大型語言模型的偏好學習統一觀點：一項調查

摘要

大型語言模型（LLMs）展現出非常強大的能力。實現成功的關鍵因素之一是將LLM的輸出與人類偏好對齊。這種對齊過程通常僅需要少量數據來有效增強LLM的性能。儘管有效，但這一領域的研究涉及多個領域，所涉及的方法相對複雜難以理解。不同方法之間的關係尚未得到充分探討，限制了偏好對齊的發展。因此，我們將現有的流行對齊策略分解為不同組件，並提供一個統一的框架來研究當前的對齊策略，從而建立它們之間的聯繫。在這份調查中，我們將所有偏好學習策略分解為四個組件：模型、數據、反饋和算法。這種統一觀點提供了對現有對齊算法的深入理解，同時也為協同不同策略的優勢開啟了可能性。此外，我們提供了流行現有算法的詳細工作示例，以便讀者全面理解。最後，基於我們的統一觀點，我們探討了將大型語言模型與人類偏好對齊的挑戰和未來研究方向。

English

Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to efficiently enhance the LLM's performance. While effective, research in this area spans multiple domains, and the methods involved are relatively complex to understand. The relationships between different methods have been under-explored, limiting the development of the preference alignment. In light of this, we break down the existing popular alignment strategies into different components and provide a unified framework to study the current alignment strategies, thereby establishing connections among them. In this survey, we decompose all the strategies in preference learning into four components: model, data, feedback, and algorithm. This unified view offers an in-depth understanding of existing alignment algorithms and also opens up possibilities to synergize the strengths of different strategies. Furthermore, we present detailed working examples of prevalent existing algorithms to facilitate a comprehensive understanding for the readers. Finally, based on our unified perspective, we explore the challenges and future research directions for aligning large language models with human preferences.

朝向大型語言模型的偏好學習統一觀點：一項調查

Towards a Unified View of Preference Learning for Large Language Models: A Survey

摘要

Support