ChatPaper.aiChatPaper

普及推理能力:从大型语言模型中定制学习

Democratizing Reasoning Ability: Tailored Learning from Large Language Model

October 20, 2023
作者: Zhaoyang Wang, Shaohan Huang, Yuxuan Liu, Jiahai Wang, Minghui Song, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang
cs.AI

摘要

大型语言模型(LLMs)在自然语言处理中展现出令人印象深刻的新颖能力,但由于巨大的计算需求和封闭源特性,它们的民主化受到阻碍。最近关于通过从黑盒LLMs中提炼知识来推进开源较小型LLMs的研究在指令遵循能力方面取得了令人期待的结果。然而,更具挑战性的推理能力相对较少被探索。本文提出了一种定制的学习方法,用于将这种推理能力提炼到较小型LLMs中,以促进独占性推理能力的民主化。与仅仅将LLM作为数据标注者不同,我们利用LLM作为推理教师的潜力,构建了一个交互式多轮学习范式。这一范式使学生能够向黑盒教师展示其不足之处,而后者可以提供定制的训练数据作为回报。此外,为了挖掘较小型LM的推理潜力,我们提出了自我反思学习,以激励学生从自身错误中学习。由于与多轮学习范式的无缝集成,来自自我反思和LLM的学习都针对学生的学习状态进行了定制。对数学和常识推理任务的全面实验和分析展示了我们方法的有效性。代码将在https://github.com/Raibows/Learn-to-Reason 上提供。
English
Large language models (LLMs) exhibit impressive emergent abilities in natural language processing, but their democratization is hindered due to huge computation requirements and closed-source nature. Recent research on advancing open-source smaller LMs by distilling knowledge from black-box LLMs has obtained promising results in the instruction-following ability. However, the reasoning ability which is more challenging to foster, is relatively rarely explored. In this paper, we propose a tailored learning approach to distill such reasoning ability to smaller LMs to facilitate the democratization of the exclusive reasoning ability. In contrast to merely employing LLM as a data annotator, we exploit the potential of LLM as a reasoning teacher by building an interactive multi-round learning paradigm. This paradigm enables the student to expose its deficiencies to the black-box teacher who then can provide customized training data in return. Further, to exploit the reasoning potential of the smaller LM, we propose self-reflection learning to motivate the student to learn from self-made mistakes. The learning from self-reflection and LLM are all tailored to the student's learning status, thanks to the seamless integration with the multi-round learning paradigm. Comprehensive experiments and analysis on mathematical and commonsense reasoning tasks demonstrate the effectiveness of our method. The code will be available at https://github.com/Raibows/Learn-to-Reason.
PDF161December 15, 2024