大規模言語モデルの推論において、忍耐が重要である。

要旨

最近の大規模言語モデルの分野における進展は、特にChain of Thought（CoT）アプローチを通じて、複雑な問題の解決において著しい改善が示されています。しかしながら、既存のモデルは、ユーザーの好みによる簡潔さのために詳細な推論を犠牲にする傾向があるか、複雑な推論能力を学習するために広範囲かつ高額なトレーニングデータが必要とされるため、複雑なタスクの解決においてその潜在能力が制限されています。このギャップを埋めるために、テスト時のスケーリング概念に従い、新しい知識やスキルを導入する必要なく、モデルがより忍耐強い推論スタイルを採用するよう促す簡単な方法を提案します。好みの最適化アプローチを採用するために、詳細な推論プロセスを正例とし、簡単な回答を負例として生成し、モデルが回答において徹底性を重視するようトレーニングします。当社の結果は、軽量なデータセットでのトレーニングにより、GSM8kにおいて最大6.7%の性能向上を示しています。

English

Recent advancements in the field of large language models, particularly through the Chain of Thought (CoT) approach, have demonstrated significant improvements in solving complex problems. However, existing models either tend to sacrifice detailed reasoning for brevity due to user preferences, or require extensive and expensive training data to learn complicated reasoning ability, limiting their potential in solving complex tasks. To bridge this gap, following the concept of scaling test-time, we propose a simple method by encouraging models to adopt a more patient reasoning style without the need of introducing new knowledge or skills. To employ a preference optimization approach, we generate detailed reasoning processes as positive examples and simple answers as negative examples, thereby training the model to favor thoroughness in its responses. Our results demonstrate a performance increase of up to 6.7% on GSM8k with training just on a lightweight dataset.

大規模言語モデルの推論において、忍耐が重要である。

Patience Is The Key to Large Language Model Reasoning

要旨

Support