响应调整:在没有指导的情况下对齐大型语言模型
Response Tuning: Aligning Large Language Models without Instruction
October 3, 2024
作者: Seokhyun An, Hyounghun Kim
cs.AI
摘要
指导调优-使用指导-响应对进行监督微调-是将预训练的大型语言模型(LLMs)转变为有用且安全的聊天助手的基础步骤。我们的假设是,在预训练LLMs固有能力的基础上,建立一个足够的输出空间可以实现这种转变。为了验证这一点,我们提出了响应调优(RT),它消除了指导调优中的指导调节步骤,仅专注于响应空间的监督。我们的实验表明,仅使用响应训练的RT模型可以有效地回应各种指令,并表现出与其经过指导调优的对应物相当的帮助性。此外,我们观察到,控制训练响应分布可以显著改善它们的用户偏好或引发目标行为,如拒绝对不安全查询提供帮助。我们的发现阐明了在对齐中建立一个足够的输出空间的作用,突显了预训练LLMs广泛固有能力的潜力。
English
Instruction tuning-supervised fine-tuning using instruction-response pairs-is
a foundational step in transitioning pre-trained Large Language Models (LLMs)
into helpful and safe chat assistants. Our hypothesis is that establishing an
adequate output space can enable such a transition given the capabilities
inherent in pre-trained LLMs. To verify this, we propose Response Tuning (RT),
which eliminates the instruction-conditioning step in instruction tuning and
solely focuses on response space supervision. Our experiments demonstrate that
RT models, trained only using responses, can effectively respond to a wide
range of instructions and exhibit helpfulness comparable to that of their
instruction-tuned counterparts. Furthermore, we observe that controlling the
training response distribution can significantly improve their user preference
or elicit target behaviors such as refusing assistance for unsafe queries. Our
findings illuminate the role of establishing an adequate output space in
alignment, highlighting the potential of the extensive inherent capabilities of
pre-trained LLMs.Summary
AI-Generated Summary