ChatPaper.aiChatPaper

REFINE-AF:一种任务无关框架,通过自动反馈强化学习利用自生成指令对齐语言模型

REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback

May 10, 2025
作者: Aniruddha Roy, Pretam Ray, Abhilash Nandy, Somak Aditya, Pawan Goyal
cs.AI

摘要

基于指令的大型语言模型(LLMs)在众多少样本或零样本自然语言处理(NLP)任务中已展现出显著成效。然而,人工标注指令数据不仅耗时、成本高昂,且在数量和任务多样性上往往受限。先前的研究尝试通过提出能够直接从模型本身以半自动化、任务无关的方式生成指令的框架来应对这一挑战。这些努力大多依赖于如GPT-3.5(175B)这样的大型仅API参数模型,这些模型不仅昂贵,还受到查询次数限制。本文探讨了三种开源小型LLMs——LLaMA 2-7B、LLaMA 2-13B和Mistral 7B,在采用半自动化框架下的表现,从而减少了为微调LLMs生成指令数据集所需的人力干预、努力及成本。此外,我们展示了将基于强化学习(RL)的训练算法融入这一LLMs框架后,能带来进一步的性能提升。对数据集的评估表明,相较于以往方法,这些基于RL的框架在63%至66%的任务中实现了显著改进。
English
Instruction-based Large Language Models (LLMs) have proven effective in numerous few-shot or zero-shot Natural Language Processing (NLP) tasks. However, creating human-annotated instruction data is time-consuming, expensive, and often limited in quantity and task diversity. Previous research endeavors have attempted to address this challenge by proposing frameworks capable of generating instructions in a semi-automated and task-agnostic manner directly from the model itself. Many of these efforts have relied on large API-only parameter-based models such as GPT-3.5 (175B), which are expensive, and subject to limits on a number of queries. This paper explores the performance of three open-source small LLMs such as LLaMA 2-7B, LLama 2-13B, and Mistral 7B, using a semi-automated framework, thereby reducing human intervention, effort, and cost required to generate an instruction dataset for fine-tuning LLMs. Furthermore, we demonstrate that incorporating a Reinforcement Learning (RL) based training algorithm into this LLMs-based framework leads to further enhancements. Our evaluation of the dataset reveals that these RL-based frameworks achieve a substantial improvements in 63-66% of the tasks compared to previous approaches.

Summary

AI-Generated Summary

PDF251May 13, 2025