ChatPaper.aiChatPaper

ICE-GRT:基於生成強化的Transformer的指令上下文增強

ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers

January 4, 2024
作者: Chen Zheng, Ke Sun, Da Tang, Yukun Ma, Yuyu Zhang, Chenguang Xi, Xun Zhou
cs.AI

摘要

隨著ChatGPT和LLaMA等大型語言模型(LLMs)的出現,這些模型在特定領域任務中遇到了限制,通常在專業領域缺乏深度和準確性,並在微調時表現出一般能力下降,特別是小型模型的分析能力。為了解決這些差距,我們引入ICE-GRT,利用基於Proximal Policy Optimization(PPO)的人類反饋強化學習(RLHF),在領域內情境中展現出卓越的能力,而不影響一般任務表現。我們對ICE-GRT的探索突顯了其理解和推理能力,不僅能生成強大答案,還能提供答案背後的詳細分析。這種能力標誌著ICE-GRT在監督微調模型範疇之外的重大進展。ICE-GRT的成功取決於幾個關鍵因素,包括適當的數據、獎勵大小縮放、KL控制、優勢歸一化等。ICE-GRT模型在特定領域任務和12個一般語言任務中展現出最先進的性能,與同等大小甚至更大的LLMs相比,突出了我們方法的有效性。我們對ICE-GRT進行了全面分析,強調了它為LLMs領域帶來的重大進展。
English
The emergence of Large Language Models (LLMs) such as ChatGPT and LLaMA encounter limitations in domain-specific tasks, with these models often lacking depth and accuracy in specialized areas, and exhibiting a decrease in general capabilities when fine-tuned, particularly analysis ability in small sized models. To address these gaps, we introduce ICE-GRT, utilizing Reinforcement Learning from Human Feedback (RLHF) grounded in Proximal Policy Optimization (PPO), demonstrating remarkable ability in in-domain scenarios without compromising general task performance. Our exploration of ICE-GRT highlights its understanding and reasoning ability to not only generate robust answers but also to provide detailed analyses of the reasons behind the answer. This capability marks a significant progression beyond the scope of Supervised Fine-Tuning models. The success of ICE-GRT is dependent on several crucial factors, including Appropriate Data, Reward Size Scaling, KL-Control, Advantage Normalization, etc. The ICE-GRT model exhibits state-of-the-art performance in domain-specific tasks and across 12 general Language tasks against equivalent size and even larger size LLMs, highlighting the effectiveness of our approach. We provide a comprehensive analysis of the ICE-GRT, underscoring the significant advancements it brings to the field of LLM.
PDF111December 15, 2024