보여주기, 말하지 않기: 언어 모델을 시연된 피드백과 정렬하기

초록

언어 모델은 다수의 집단적 목소리를 모방하도록 정렬되어 있어, 특정 개인과 일치하지 않는 출력을 생성합니다. 지도 미세 조정(Supervised Finetuning)이나 인간 피드백 강화 학습(RLHF)을 통해 LLM(Large Language Model)이 일반적인 출력에서 벗어나도록 유도할 수 있지만, 새로운 임시 작업에 대해 과도하게 큰 데이터셋이 필요합니다. 우리는 대신 매우 적은 수(10개 미만)의 데모를 피드백으로 활용하여 LLM을 특정 설정에 맞게 정렬할 수 있다고 주장합니다. 우리의 방법인 데모 기반 반복 작업 최적화(Demonstration ITerated Task Optimization, DITTO)는 언어 모델의 출력을 사용자의 데모 행동에 직접 정렬합니다. 온라인 모방 학습(Online Imitation Learning) 아이디어에서 파생된 DITTO는 사용자의 데모를 LLM 및 중간 체크포인트의 출력보다 선호되는 것으로 간주하여 저렴하게 온라인 비교 데이터를 생성합니다. 우리는 DITTO가 뉴스 기사, 이메일, 블로그 게시물 등 다양한 도메인에서 세밀한 스타일 및 작업 정렬을 학습하는 능력을 평가합니다. 또한, 참가자(N=16)로부터 다양한 데모를 수집하는 사용자 연구를 수행합니다. 벤치마크와 사용자 연구 전반에 걸쳐, DITTO의 승률은 퓨샷 프롬프팅(Few-shot Prompting), 지도 미세 조정 및 기타 자기 대전(Self-play) 방법보다 평균 19% 포인트 더 높은 것으로 나타났습니다. 데모를 직접 피드백으로 사용함으로써, DITTO는 LLM의 효과적인 맞춤화를 위한 새로운 방법을 제시합니다.

English

Language models are aligned to emulate the collective voice of many, resulting in outputs that align with no one in particular. Steering LLMs away from generic output is possible through supervised finetuning or RLHF, but requires prohibitively large datasets for new ad-hoc tasks. We argue that it is instead possible to align an LLM to a specific setting by leveraging a very small number (<10) of demonstrations as feedback. Our method, Demonstration ITerated Task Optimization (DITTO), directly aligns language model outputs to a user's demonstrated behaviors. Derived using ideas from online imitation learning, DITTO cheaply generates online comparison data by treating users' demonstrations as preferred over output from the LLM and its intermediate checkpoints. We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts. Additionally, we conduct a user study soliciting a range of demonstrations from participants (N=16). Across our benchmarks and user study, we find that win-rates for DITTO outperform few-shot prompting, supervised fine-tuning, and other self-play methods by an average of 19% points. By using demonstrations as feedback directly, DITTO offers a novel method for effective customization of LLMs.

보여주기, 말하지 않기: 언어 모델을 시연된 피드백과 정렬하기

Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

초록

Support