GReaTer: 推論上の勾配が小さな言語モデルを強化するプロンプト最適化器

要旨

大規模言語モデル（LLMs）の効果は、プロンプトの設計と密接に関連しており、様々なタスクでのパフォーマンスを向上させるためにプロンプトの最適化が不可欠です。自動化されたプロンプトエンジニアリングへの多くの既存手法は、大規模で計算コストの高いLLMsによって特定された推論エラーに基づいてプロンプトを改良するために、テキストフィードバックにのみ依存しています。残念ながら、より小さなモデルは高品質なフィードバックを生成するのに苦労し、大規模LLMの判断に完全に依存することになります。さらに、これらの手法は、純粋にテキスト空間での操作のため、勾配などのより直接的で精緻な情報を活用することができません。このため、我々は、GReaTerという新しいプロンプト最適化技術を紹介します。GReaTerは、タスク固有の推論に対する勾配情報を直接組み込むことで、オープンソースで軽量な言語モデル向けのプロンプトの自己最適化を実現します。これにより、高性能なプロンプト最適化が巨大なLLMsに依存せずに可能となり、小さなモデルとプロンプトの洗練によく必要とされる洗練された推論との間のギャップを埋めます。BBH、GSM8k、FOLIOを含む多様な推論タスクを対象とした包括的な評価により、GReaTerが従来の最先端のプロンプト最適化手法を一貫して上回ることが示されました。さらに、GReaTerによって最適化されたプロンプトは、より高い転移性を示し、一部の場合には、大規模言語モデルに匹敵するかそれを上回るレベルのタスクパフォーマンスを向上させることがあり、勾配による推論によるプロンプト最適化の効果を示しています。GReaTerのコードは、https://github.com/psunlpgroup/GreaTer で入手可能です。

English

The effectiveness of large language models (LLMs) is closely tied to the design of prompts, making prompt optimization essential for enhancing their performance across a wide range of tasks. Many existing approaches to automating prompt engineering rely exclusively on textual feedback, refining prompts based solely on inference errors identified by large, computationally expensive LLMs. Unfortunately, smaller models struggle to generate high-quality feedback, resulting in complete dependence on large LLM judgment. Moreover, these methods fail to leverage more direct and finer-grained information, such as gradients, due to operating purely in text space. To this end, we introduce GReaTer, a novel prompt optimization technique that directly incorporates gradient information over task-specific reasoning. By utilizing task loss gradients, GReaTer enables self-optimization of prompts for open-source, lightweight language models without the need for costly closed-source LLMs. This allows high-performance prompt optimization without dependence on massive LLMs, closing the gap between smaller models and the sophisticated reasoning often needed for prompt refinement. Extensive evaluations across diverse reasoning tasks including BBH, GSM8k, and FOLIO demonstrate that GReaTer consistently outperforms previous state-of-the-art prompt optimization methods, even those reliant on powerful LLMs. Additionally, GReaTer-optimized prompts frequently exhibit better transferability and, in some cases, boost task performance to levels comparable to or surpassing those achieved by larger language models, highlighting the effectiveness of prompt optimization guided by gradients over reasoning. Code of GReaTer is available at https://github.com/psunlpgroup/GreaTer.

GReaTer: 推論上の勾配が小さな言語モデルを強化するプロンプト最適化器

GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers

要旨

Support