대규모 언어 모델을 위한 선호 학습의 통합된 관점으로: 조사

초록

대규모 언어 모델 (LLM)은 놀랍도록 강력한 능력을 보여줍니다. 성공을 거두기 위한 중요한 요소 중 하나는 LLM의 출력을 인간의 선호에 맞추는 것입니다. 이러한 조정 과정은 종종 소량의 데이터만 필요로 하여 효율적으로 LLM의 성능을 향상시킬 수 있습니다. 효과적이지만, 이 분야의 연구는 여러 영역에 걸쳐 이루어지며, 관련된 방법들은 비교적 이해하기 어려운 복잡성을 가지고 있습니다. 서로 다른 방법들 간의 관계는 충분히 탐구되지 않아 선호 조정의 발전을 제한하고 있습니다. 이에 우리는 기존의 인기 있는 조정 전략들을 다양한 구성 요소로 분해하고 현재의 조정 전략들을 연구하기 위한 통합된 프레임워크를 제시하여 그들 간의 연결을 확립합니다. 이 설문 조사에서 우리는 모든 선호 학습 전략을 네 가지 구성 요소인 모델, 데이터, 피드백 및 알고리즘으로 분해합니다. 이 통합된 관점은 기존의 조정 알고리즘에 대한 깊은 이해를 제공하며 서로 다른 전략들의 강점을 시너지적으로 결합할 수 있는 가능성을 엽니다. 또한 널리 사용되는 기존 알고리즘의 상세한 작동 예제를 제시하여 독자들에게 포괄적인 이해를 돕습니다. 마지막으로, 우리의 통합된 관점을 기반으로 대규모 언어 모델을 인간의 선호에 맞추기 위한 도전과 미래 연구 방향을 탐색합니다.

English

Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to efficiently enhance the LLM's performance. While effective, research in this area spans multiple domains, and the methods involved are relatively complex to understand. The relationships between different methods have been under-explored, limiting the development of the preference alignment. In light of this, we break down the existing popular alignment strategies into different components and provide a unified framework to study the current alignment strategies, thereby establishing connections among them. In this survey, we decompose all the strategies in preference learning into four components: model, data, feedback, and algorithm. This unified view offers an in-depth understanding of existing alignment algorithms and also opens up possibilities to synergize the strengths of different strategies. Furthermore, we present detailed working examples of prevalent existing algorithms to facilitate a comprehensive understanding for the readers. Finally, based on our unified perspective, we explore the challenges and future research directions for aligning large language models with human preferences.

대규모 언어 모델을 위한 선호 학습의 통합된 관점으로: 조사

Towards a Unified View of Preference Learning for Large Language Models: A Survey

초록

Support