勾配が衝突するとき：LLM評価器のための多目的プロンプト最適化の失敗モード

要旨

特定のタスクやドメインにLLM判定器を適応させる場合、複数の評価基準を同時に最適化するためにプロンプトを調整することが多い。テキスト勾配法は単一の判定基準に対してこのプロセスを自動化するが、自然言語による批評を生成するものであり、数値ベクトルを出力するわけではない。したがって、マルチタスク学習における競合解決の手法（PCGrad、MGDA）は、多目的テキスト勾配設定には適用できない。我々は、損失、勾配、および最適化LLMが共有するタスク間情報の程度を変化させることで、テキスト勾配最適化器の5つの分解モードを検証した。10の構成のうち6つにおいて、最適化が初期プロンプトを改善しないことを観察した。勾配LLMが複数の基準を同時に処理する場合、勾配の特異性は59%低下した（9.0から3.7へ）。また、タスクごとの指示を単一のプロンプトに単純に統合すると、スピアマンのρが-5.3%低下することを別途観察した。これらの結果は、最適化時の勾配希釈と推論時の命令干渉という2つの分離可能な障害モードを特定するものであり、これらがテキストフィードバックを用いた多目的判定器カスタマイズの設計空間を制約する。

English

Customizing an LLM judge to a specific task or domain often involves optimizing its prompt across multiple evaluation criteria simultaneously. Textual gradient methods automate this for a single judge criterion, however they produce natural-language critiques, not numerical vectors. Thus, the conflict-resolution toolkit of multi-task learning (PCGrad, MGDA) doesn't apply to the multi-objective textual gradient setting. We test five decomposition modes of textual gradient optimizers by varying how much cross-task information the loss, gradient and optimizer LLMs share. In 6 of 10 configurations, we observe that optimization never improves over the initial prompt. Gradient specificity drops by 59% (from 9.0 to 3.7) when the gradient LLM processes multiple criteria jointly. Separately, we observe that naively combining per-task instructions into a single prompt degrades Spearman's rho by -5.3%. These results identify two separable failure modes: optimization-time gradient dilution and inference-time instruction interference, which together constrain the design space for multi-objective judge customization using textual feedback.