思考LLMs:具有思维生成的通用指令遵循
Thinking LLMs: General Instruction Following with Thought Generation
October 14, 2024
作者: Tianhao Wu, Janice Lan, Weizhe Yuan, Jiantao Jiao, Jason Weston, Sainbayar Sukhbaatar
cs.AI
摘要
通常,LLM被训练用于回答用户问题或遵循类似于人类专家回答的指令。然而,在标准对齐框架中,它们缺乏在回答之前进行明确思考的基本能力。思考对于需要推理和规划的复杂问题至关重要,但可以应用于任何任务。我们提出了一种训练方法,为现有的LLM配备这种思考能力,以便进行一般指令遵循,而无需使用额外的人类数据。我们通过迭代搜索和优化过程实现了这一点,该过程探索可能的思考生成空间,使模型能够学会如何在没有直接监督的情况下思考。对于每个指令,思考候选项仅通过评估其响应的评判模型进行评分,然后通过偏好优化进行优化。我们展示了这一过程在AlpacaEval和Arena-Hard上取得了卓越表现,并显示了在非推理类别(如营销、健康和一般知识)以及更传统的推理和问题解决任务中思考的收益。
English
LLMs are typically trained to answer user questions or follow instructions
similarly to how human experts respond. However, in the standard alignment
framework they lack the basic ability of explicit thinking before answering.
Thinking is important for complex questions that require reasoning and planning
-- but can be applied to any task. We propose a training method for equipping
existing LLMs with such thinking abilities for general instruction following
without use of additional human data. We achieve this by an iterative search
and optimization procedure that explores the space of possible thought
generations, allowing the model to learn how to think without direct
supervision. For each instruction, the thought candidates are scored using a
judge model to evaluate their responses only, and then optimized via preference
optimization. We show that this procedure leads to superior performance on
AlpacaEval and Arena-Hard, and shows gains from thinking on non-reasoning
categories such as marketing, health and general knowledge, in addition to more
traditional reasoning & problem-solving tasks.Summary
AI-Generated Summary