Mistral-C2F:RLHF 中的分析和推理增强的粗到细的演员和有效合并的LLMs
Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs
June 12, 2024
作者: Chen Zheng, Ke Sun, Xun Zhou
cs.AI
摘要
尽管大型语言模型(LLMs)取得了进展,例如GPT-4和Claude等模型,但小规模LLMs,如Llama和Mistral,通常难以生成深入和连贯的对话。本文提出了一种新颖的两步骤粗到细的演员模型,以解决小型LLMs在对话和分析能力方面的固有局限性。我们的方法始于基于策略的粗糙演员,采用了我们称之为“连续最大化”的技术。粗糙演员建立了一个增强的、知识丰富的池,擅长与人类偏好风格在分析和推理方面保持一致。通过RLHF过程,它采用了连续最大化,这是一种动态自适应地扩展输出长度限制的策略,从而实现更详细和分析性内容的生成。随后,精细演员对这些分析内容进行了改进,解决了粗糙演员生成过多冗余信息的问题。我们引入了一种“知识残留合并”方法,对粗糙演员的内容进行了改进,并将其与现有的指导模型合并,以提高质量、正确性,并减少冗余。我们将这种方法应用于流行的Mistral模型,创建了Mistral-C2F,它在11个通用语言任务和MT-Bench对话任务中展现出了卓越的性能,胜过了类似规模的模型,甚至是具有130亿和300亿参数的更大模型。我们的模型显著提高了对话和分析推理能力。
English
Despite the advances in Large Language Models (LLMs), exemplified by models
like GPT-4 and Claude, smaller-scale LLMs such as Llama and Mistral often
struggle with generating in-depth and coherent dialogues. This paper presents a
novel two-step Coarse-to-Fine Actor model to address the inherent limitations
in conversational and analytical capabilities of small-sized LLMs. Our approach
begins with the Policy-based Coarse Actor, employing a technique we term
"Continuous Maximization". The Coarse Actor establishes an enhanced,
knowledge-rich pool adept at aligning with human preference styles in analysis
and reasoning. Through the RLHF process, it employs Continuous Maximization, a
strategy that dynamically and adaptively extends the output length limit,
enabling the generation of more detailed and analytical content. Subsequently,
the Fine Actor refines this analytical content, addressing the generation of
excessively redundant information from the Coarse Actor. We introduce a
"Knowledge Residue Merger" approach, refining the content from the Coarse Actor
and merging it with an existing Instruction model to improve quality,
correctness, and reduce redundancies. We applied our methodology to the popular
Mistral model, creating Mistral-C2F, which has demonstrated exceptional
performance across 11 general language tasks and the MT-Bench Dialogue task,
outperforming similar-scale models and even larger models with 13B and 30B
parameters. Our model has significantly improved conversational and analytical
reasoning abilities.Summary
AI-Generated Summary