LLMは教えることで学習できるか？予備的検討

要旨

学生モデルを改善するための教育（例：知識蒸留）は、LLM（大規模言語モデル）において広く研究されている方法論です。しかし、人間にとって教育は学生を向上させるだけでなく、教師自身も向上させます。そこで私たちは問います：LLMも「教えることで学ぶ（Learning by Teaching, LbT）」ことができるのか？もし可能であれば、人間が生成したデータやより強力なモデルに依存することなく、モデルを継続的に進化させる可能性を開くことができるかもしれません。本論文では、この野心的なアジェンダに対する予備的な探求を提供します。LbTのアイデアが既存のLLMのトレーニング/プロンプト設計パイプラインに組み込まれ、顕著な改善をもたらすことを示します。具体的には、人間のLbTの3つのレベルを模倣した3つの方法を設計しました：学生のフィードバックを観察する、フィードバックから学ぶ、そして反復的に学ぶことで、トレーニングなしで回答精度を向上させたり、ファインチューニングを通じてモデルの本質的な能力を向上させたりすることを目指します。結果は有望です。例えば、人間のLbTと同様に、以下のことが観察されました：(1) LbTは弱いモデルから強いモデルへの一般化を誘発する可能性がある：強いモデルが他の弱いモデルを教えることで自身を改善できる。(2) 学生の多様性が役立つ可能性がある：複数の学生を教えることは、1人の学生や教師自身を教えるよりも効果的かもしれない。この初期の成果が、LbTに関する将来の研究や、教育分野の先進的な技術をLLMの改善に広く適用することを刺激することを願っています。コードはhttps://github.com/imagination-research/lbtで公開されています。

English

Teaching to improve student models (e.g., knowledge distillation) is an extensively studied methodology in LLMs. However, for humans, teaching not only improves students but also improves teachers. We ask: Can LLMs also learn by teaching (LbT)? If yes, we can potentially unlock the possibility of continuously advancing the models without solely relying on human-produced data or stronger models. In this paper, we provide a preliminary exploration of this ambitious agenda. We show that LbT ideas can be incorporated into existing LLM training/prompting pipelines and provide noticeable improvements. Specifically, we design three methods, each mimicking one of the three levels of LbT in humans: observing students' feedback, learning from the feedback, and learning iteratively, with the goals of improving answer accuracy without training and improving models' inherent capability with fine-tuning. The findings are encouraging. For example, similar to LbT in human, we see that: (1) LbT can induce weak-to-strong generalization: strong models can improve themselves by teaching other weak models; (2) Diversity in students might help: teaching multiple students could be better than teaching one student or the teacher itself. We hope that this early promise can inspire future research on LbT and more broadly adopting the advanced techniques in education to improve LLMs. The code is available at https://github.com/imagination-research/lbt.

LLMは教えることで学習できるか？予備的検討

Can LLMs Learn by Teaching? A Preliminary Study

要旨

Support