Mistral-C2F:粗到細的演算法,用於增強RLHF中的分析和推理,以及有效合併的LLM
Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs
June 12, 2024
作者: Chen Zheng, Ke Sun, Xun Zhou
cs.AI
摘要
儘管大型語言模型(LLMs)如GPT-4和Claude等模型的進步,小規模語言模型如Llama和Mistral通常在生成深入且連貫的對話方面遇到困難。本文提出了一種新穎的兩步驟粗到細演員模型,以應對小型LLMs在對話和分析能力方面的固有限制。我們的方法始於基於策略的粗糙演員,採用我們稱之為“連續最大化”的技術。粗糙演員建立了一個增強的、知識豐富的池,擅長與人類偏好風格在分析和推理方面保持一致。通過RLHF過程,它採用了連續最大化,這是一種動態且自適應地擴展輸出長度限制的策略,從而實現更詳細和分析性內容的生成。隨後,細化演員對這些分析內容進行了精煉,解決了粗糙演員生成過多冗余信息的問題。我們引入了“知識殘留合併器”方法,從粗糙演員中精煉內容,並將其與現有的指導模型合併,以提高質量、正確性並減少冗余。我們將我們的方法應用於流行的Mistral模型,創建了Mistral-C2F,它在11個通用語言任務和MT-Bench對話任務中展現出卓越的性能,優於類似規模的模型,甚至超過具有130億和300億參數的更大模型。我們的模型顯著提高了對話和分析推理能力。
English
Despite the advances in Large Language Models (LLMs), exemplified by models
like GPT-4 and Claude, smaller-scale LLMs such as Llama and Mistral often
struggle with generating in-depth and coherent dialogues. This paper presents a
novel two-step Coarse-to-Fine Actor model to address the inherent limitations
in conversational and analytical capabilities of small-sized LLMs. Our approach
begins with the Policy-based Coarse Actor, employing a technique we term
"Continuous Maximization". The Coarse Actor establishes an enhanced,
knowledge-rich pool adept at aligning with human preference styles in analysis
and reasoning. Through the RLHF process, it employs Continuous Maximization, a
strategy that dynamically and adaptively extends the output length limit,
enabling the generation of more detailed and analytical content. Subsequently,
the Fine Actor refines this analytical content, addressing the generation of
excessively redundant information from the Coarse Actor. We introduce a
"Knowledge Residue Merger" approach, refining the content from the Coarse Actor
and merging it with an existing Instruction model to improve quality,
correctness, and reduce redundancies. We applied our methodology to the popular
Mistral model, creating Mistral-C2F, which has demonstrated exceptional
performance across 11 general language tasks and the MT-Bench Dialogue task,
outperforming similar-scale models and even larger models with 13B and 30B
parameters. Our model has significantly improved conversational and analytical
reasoning abilities.