智能推理能力的民主化：從大型語言模型中定制學習

摘要

大型語言模型（LLMs）展現了在自然語言處理中令人印象深刻的新興能力，但由於龐大的計算需求和封閉源代碼的特性，其民主化受到阻礙。最近關於透過從黑盒LLMs中提煉知識來推進開源較小型LLMs的研究，在指示遵循能力方面取得了令人期待的成果。然而，更具挑戰性的推理能力相對較少被探索。在本文中，我們提出了一種定制的學習方法，以提煉這種推理能力到較小型LLMs，以促進專屬推理能力的民主化。與僅僅將LLM作為數據標註者不同，我們利用LLM作為推理教師的潛力，通過構建互動式多輪學習範式。這種範式使學生能夠向黑盒教師展示其不足之處，然後教師可以反過來提供定制的訓練數據。此外，為了發揮較小型LM的推理潛力，我們提出了自我反思學習，以激勵學生從自己的錯誤中學習。由於與多輪學習範式的無縫集成，自我反思學習和LLM的學習都針對學生的學習狀態進行了定制。對數學和常識推理任務的全面實驗和分析展示了我們方法的有效性。代碼將在https://github.com/Raibows/Learn-to-Reason 上提供。

English

Large language models (LLMs) exhibit impressive emergent abilities in natural language processing, but their democratization is hindered due to huge computation requirements and closed-source nature. Recent research on advancing open-source smaller LMs by distilling knowledge from black-box LLMs has obtained promising results in the instruction-following ability. However, the reasoning ability which is more challenging to foster, is relatively rarely explored. In this paper, we propose a tailored learning approach to distill such reasoning ability to smaller LMs to facilitate the democratization of the exclusive reasoning ability. In contrast to merely employing LLM as a data annotator, we exploit the potential of LLM as a reasoning teacher by building an interactive multi-round learning paradigm. This paradigm enables the student to expose its deficiencies to the black-box teacher who then can provide customized training data in return. Further, to exploit the reasoning potential of the smaller LM, we propose self-reflection learning to motivate the student to learn from self-made mistakes. The learning from self-reflection and LLM are all tailored to the student's learning status, thanks to the seamless integration with the multi-round learning paradigm. Comprehensive experiments and analysis on mathematical and commonsense reasoning tasks demonstrate the effectiveness of our method. The code will be available at https://github.com/Raibows/Learn-to-Reason.

智能推理能力的民主化：從大型語言模型中定制學習

Democratizing Reasoning Ability: Tailored Learning from Large Language Model

摘要

Support