ChatPaper.aiChatPaper

ICE-GRT:基于生成强化的转换器的指令上下文增强

ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers

January 4, 2024
作者: Chen Zheng, Ke Sun, Da Tang, Yukun Ma, Yuyu Zhang, Chenguang Xi, Xun Zhou
cs.AI

摘要

大型语言模型(LLMs)的出现,如ChatGPT和LLaMA,在特定领域任务中遇到了一些限制,这些模型通常在专业领域缺乏深度和准确性,并且在微调时普遍表现出一般能力下降,特别是在小型模型中的分析能力。为了解决这些差距,我们引入了ICE-GRT,利用基于近端策略优化(PPO)的人类反馈强化学习(RLHF),在领域内场景中展示了卓越的能力,而不会影响一般任务性能。我们对ICE-GRT的探索突显了其理解和推理能力,不仅能生成强大的答案,还能提供答案背后的详细分析。这种能力标志着ICE-GRT在监督微调模型范围之外取得了重大进展。ICE-GRT的成功取决于几个关键因素,包括适当的数据、奖励大小缩放、KL控制、优势归一化等。ICE-GRT模型在特定领域任务和12个一般语言任务中展现出最先进的性能,与等效大小甚至更大的LLMs相比,突显了我们方法的有效性。我们对ICE-GRT进行了全面分析,强调了它为LLM领域带来的重大进展。
English
The emergence of Large Language Models (LLMs) such as ChatGPT and LLaMA encounter limitations in domain-specific tasks, with these models often lacking depth and accuracy in specialized areas, and exhibiting a decrease in general capabilities when fine-tuned, particularly analysis ability in small sized models. To address these gaps, we introduce ICE-GRT, utilizing Reinforcement Learning from Human Feedback (RLHF) grounded in Proximal Policy Optimization (PPO), demonstrating remarkable ability in in-domain scenarios without compromising general task performance. Our exploration of ICE-GRT highlights its understanding and reasoning ability to not only generate robust answers but also to provide detailed analyses of the reasons behind the answer. This capability marks a significant progression beyond the scope of Supervised Fine-Tuning models. The success of ICE-GRT is dependent on several crucial factors, including Appropriate Data, Reward Size Scaling, KL-Control, Advantage Normalization, etc. The ICE-GRT model exhibits state-of-the-art performance in domain-specific tasks and across 12 general Language tasks against equivalent size and even larger size LLMs, highlighting the effectiveness of our approach. We provide a comprehensive analysis of the ICE-GRT, underscoring the significant advancements it brings to the field of LLM.
PDF111December 15, 2024