ChatPaper.aiChatPaper

分而治之?你应该将LLM精炼的哪一部分?

Divide-or-Conquer? Which Part Should You Distill Your LLM?

February 22, 2024
作者: Zhuofeng Wu, He Bai, Aonan Zhang, Jiatao Gu, VG Vinod Vydiswaran, Navdeep Jaitly, Yizhe Zhang
cs.AI

摘要

最近的研究方法表明,大型语言模型(LLMs)在被鼓励先解决主任务的子任务时,能够更好地解决推理任务。在本文中,我们设计了一种类似的策略,将推理任务分解为问题分解阶段和问题解决阶段,并展示了这种策略能够胜过单阶段解决方案。此外,我们假设相比于需要大量领域知识的问题解决阶段,问题分解阶段更容易被蒸馏为较小的模型,因为前者只需要学习一般的问题解决策略。我们提出了蒸馏这两种能力的方法,并评估了它们对推理结果和推理成本的影响。我们发现,我们可以蒸馏问题分解阶段,并同时在任务、数据集和模型之间取得良好的泛化。然而,要蒸馏问题解决能力却更困难,而且结果蒸馏模型在泛化方面表现不佳。这些结果表明,通过将较小的、蒸馏的问题分解模型与问题解决LLMs结合使用,我们可以实现具有成本效益的推理和局部适应。
English
Recent methods have demonstrated that Large Language Models (LLMs) can solve reasoning tasks better when they are encouraged to solve subtasks of the main task first. In this paper we devise a similar strategy that breaks down reasoning tasks into a problem decomposition phase and a problem solving phase and show that the strategy is able to outperform a single stage solution. Further, we hypothesize that the decomposition should be easier to distill into a smaller model compared to the problem solving because the latter requires large amounts of domain knowledge while the former only requires learning general problem solving strategies. We propose methods to distill these two capabilities and evaluate their impact on reasoning outcomes and inference cost. We find that we can distill the problem decomposition phase and at the same time achieve good generalization across tasks, datasets, and models. However, it is harder to distill the problem solving capability without losing performance and the resulting distilled model struggles with generalization. These results indicate that by using smaller, distilled problem decomposition models in combination with problem solving LLMs we can achieve reasoning with cost-efficient inference and local adaptation.
PDF241December 15, 2024