Mistral-C2F: Грубый к Тонкому Актер для Улучшения Аналитики и Рассуждений в RLHF и Эффективно Объединенных LLMs

Аннотация

Несмотря на прогресс в области больших языковых моделей (LLM), продемонстрированный моделями, такими как GPT-4 и Claude, LLM меньшего масштаба, такие как Llama и Mistral, часто испытывают затруднения с генерацией глубоких и последовательных диалогов. В данной статье представлена новаторская двухэтапная модель Coarse-to-Fine Actor для преодоления врожденных ограничений в разговорных и аналитических способностях LLM небольшого размера. Наш подход начинается с Coarse Actor на основе политики, используя технику, которую мы называем "Continuous Maximization". Coarse Actor устанавливает улучшенный, богатый знаниями пул, способный выравниваться с предпочтениями человека в анализе и рассуждениях. Через процесс RLHF он использует Continuous Maximization, стратегию, которая динамически и адаптивно расширяет предел длины вывода, позволяя генерировать более подробное и аналитическое содержимое. Затем Fine Actor улучшает это аналитическое содержимое, решая проблему генерации избыточной информации от Coarse Actor. Мы представляем подход "Knowledge Residue Merger", улучшая содержимое от Coarse Actor и объединяя его с существующей моделью Instruction для улучшения качества, правильности и сокращения избыточности. Мы применили наш метод к популярной модели Mistral, создав Mistral-C2F, который продемонстрировал выдающуюся производительность по 11 общим языковым задачам и задаче MT-Bench Dialogue, превосходя модели с аналогичным масштабом и даже более крупные модели с 13B и 30B параметрами. Наша модель значительно улучшила разговорные и аналитические рассуждения.

English

Despite the advances in Large Language Models (LLMs), exemplified by models like GPT-4 and Claude, smaller-scale LLMs such as Llama and Mistral often struggle with generating in-depth and coherent dialogues. This paper presents a novel two-step Coarse-to-Fine Actor model to address the inherent limitations in conversational and analytical capabilities of small-sized LLMs. Our approach begins with the Policy-based Coarse Actor, employing a technique we term "Continuous Maximization". The Coarse Actor establishes an enhanced, knowledge-rich pool adept at aligning with human preference styles in analysis and reasoning. Through the RLHF process, it employs Continuous Maximization, a strategy that dynamically and adaptively extends the output length limit, enabling the generation of more detailed and analytical content. Subsequently, the Fine Actor refines this analytical content, addressing the generation of excessively redundant information from the Coarse Actor. We introduce a "Knowledge Residue Merger" approach, refining the content from the Coarse Actor and merging it with an existing Instruction model to improve quality, correctness, and reduce redundancies. We applied our methodology to the popular Mistral model, creating Mistral-C2F, which has demonstrated exceptional performance across 11 general language tasks and the MT-Bench Dialogue task, outperforming similar-scale models and even larger models with 13B and 30B parameters. Our model has significantly improved conversational and analytical reasoning abilities.

Mistral-C2F: Грубый к Тонкому Актер для Улучшения Аналитики и Рассуждений в RLHF и Эффективно Объединенных LLMs

Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs

Аннотация

Support