分割か征服か？LLMのどの部分を蒸留すべきか？

要旨

最近の研究では、大規模言語モデル（LLM）が主要タスクのサブタスクを先に解決するよう促すことで、推論タスクをより良く解決できることが示されています。本論文では、推論タスクを問題分解フェーズと問題解決フェーズに分割する類似の戦略を考案し、この戦略が単一ステージの解決策を上回ることを示します。さらに、問題解決には大量のドメイン知識が必要であるのに対し、問題分解は一般的な問題解決戦略を学ぶだけで済むため、分解の方がより小さなモデルに蒸留しやすいと仮説を立てます。我々は、これら2つの能力を蒸留する方法を提案し、推論結果と推論コストへの影響を評価します。その結果、問題分解フェーズを蒸留しつつ、タスク、データセット、モデル間で良好な汎化性能を達成できることがわかりました。しかし、問題解決能力を蒸留するのは性能を損なわずに達成するのが難しく、結果として得られた蒸留モデルは汎化に苦戦します。これらの結果は、より小さな蒸留された問題分解モデルを問題解決LLMと組み合わせることで、コスト効率の良い推論とローカル適応を実現できることを示唆しています。

English

Recent methods have demonstrated that Large Language Models (LLMs) can solve reasoning tasks better when they are encouraged to solve subtasks of the main task first. In this paper we devise a similar strategy that breaks down reasoning tasks into a problem decomposition phase and a problem solving phase and show that the strategy is able to outperform a single stage solution. Further, we hypothesize that the decomposition should be easier to distill into a smaller model compared to the problem solving because the latter requires large amounts of domain knowledge while the former only requires learning general problem solving strategies. We propose methods to distill these two capabilities and evaluate their impact on reasoning outcomes and inference cost. We find that we can distill the problem decomposition phase and at the same time achieve good generalization across tasks, datasets, and models. However, it is harder to distill the problem solving capability without losing performance and the resulting distilled model struggles with generalization. These results indicate that by using smaller, distilled problem decomposition models in combination with problem solving LLMs we can achieve reasoning with cost-efficient inference and local adaptation.

分割か征服か？LLMのどの部分を蒸留すべきか？

Divide-or-Conquer? Which Part Should You Distill Your LLM?

要旨

Support