分而治之?你應該對你的LLM進行哪部分的精煉?
Divide-or-Conquer? Which Part Should You Distill Your LLM?
February 22, 2024
作者: Zhuofeng Wu, He Bai, Aonan Zhang, Jiatao Gu, VG Vinod Vydiswaran, Navdeep Jaitly, Yizhe Zhang
cs.AI
摘要
最近的研究方法表明,當大型語言模型(LLMs)被鼓勵先解決主任務的子任務時,它們可以更好地解決推理任務。在本文中,我們設計了一種類似的策略,將推理任務分解為問題分解階段和問題解決階段,並展示這種策略能夠優於單階段解決方案。此外,我們假設相較於需要大量領域知識的問題解決階段,問題分解應該更容易被提煉為較小的模型,因為前者只需要學習一般性的問題解決策略。我們提出了提煉這兩種能力的方法,並評估它們對推理結果和推論成本的影響。我們發現我們可以提煉問題分解階段,同時實現跨任務、數據集和模型的良好泛化。然而,要提煉問題解決能力卻更難而且結果提煉後的模型在泛化方面表現困難。這些結果表明,通過在問題解決LLMs中結合較小、經過提煉的問題分解模型,我們可以實現具有成本效益的推理和局部適應。
English
Recent methods have demonstrated that Large Language Models (LLMs) can solve
reasoning tasks better when they are encouraged to solve subtasks of the main
task first. In this paper we devise a similar strategy that breaks down
reasoning tasks into a problem decomposition phase and a problem solving phase
and show that the strategy is able to outperform a single stage solution.
Further, we hypothesize that the decomposition should be easier to distill into
a smaller model compared to the problem solving because the latter requires
large amounts of domain knowledge while the former only requires learning
general problem solving strategies. We propose methods to distill these two
capabilities and evaluate their impact on reasoning outcomes and inference
cost. We find that we can distill the problem decomposition phase and at the
same time achieve good generalization across tasks, datasets, and models.
However, it is harder to distill the problem solving capability without losing
performance and the resulting distilled model struggles with generalization.
These results indicate that by using smaller, distilled problem decomposition
models in combination with problem solving LLMs we can achieve reasoning with
cost-efficient inference and local adaptation.