命令チューニングにおける形式一貫性の探求

要旨

命令チューニングは、大規模言語モデルが人間の指示に従う能力を向上させる有望なアプローチとして注目を集めています。トレーニングデータにおける命令の多様性と数を増やすことで、一般化性能が一貫して向上することが示されており、これにより、さまざまな命令を収集し、既存の命令チューニングデータセットをより大規模なコレクションに統合する取り組みが最近活発化しています。しかし、異なるユーザーは独自の命令表現方法を持っており、異なるデータセット間では命令のスタイルやフォーマットにばらつきが存在する、つまりフォーマットの不整合が生じることがよくあります。本研究では、フォーマットの不整合が命令チューニングの性能にどのような影響を与えるかを調査します。我々は「統一命令チューニング」（Unified Instruction Tuning, UIT）と呼ばれるフレームワークを提案し、異なる命令チューニングデータセット間での自動フォーマット変換のためにOpenAI APIを利用します。UITが未見の命令に対する一般化性能を成功裏に向上させることを示し、命令チューニングにおけるフォーマットの一貫性の重要性を強調します。UITフレームワークをより実用的にするため、自動フォーマット変換のノイズを低減する新しいパープレキシティベースのノイズ除去手法をさらに提案します。また、OpenAI APIと同等のフォーマット変換能力を達成するより小規模なオフラインモデルをトレーニングし、実践的なコスト削減を図ります。

English

Instruction tuning has emerged as a promising approach to enhancing large language models in following human instructions. It is shown that increasing the diversity and number of instructions in the training data can consistently enhance generalization performance, which facilitates a recent endeavor to collect various instructions and integrate existing instruction tuning datasets into larger collections. However, different users have their unique ways of expressing instructions, and there often exist variations across different datasets in the instruction styles and formats, i.e., format inconsistency. In this work, we study how format inconsistency may impact the performance of instruction tuning. We propose a framework called "Unified Instruction Tuning" (UIT), which calls OpenAI APIs for automatic format transfer among different instruction tuning datasets. We show that UIT successfully improves the generalization performance on unseen instructions, which highlights the importance of format consistency for instruction tuning. To make the UIT framework more practical, we further propose a novel perplexity-based denoising method to reduce the noise of automatic format transfer. We also train a smaller offline model that achieves comparable format transfer capability than OpenAI APIs to reduce costs in practice.

命令チューニングにおける形式一貫性の探求

Exploring Format Consistency for Instruction Tuning

要旨

Support