當梯度碰撞時:針對LLM評判器的多目標提示優化的失效模式
When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges
May 25, 2026
作者: Parth Darshan, Abhishek Divekar
cs.AI
摘要
將LLM裁判定制於特定任務或領域時,通常需要同時基於多個評估標準優化其提示詞。文本梯度方法可以自動化單一裁判標準的優化,但其產出的是自然語言評論,而非數值向量。因此,多任務學習中的衝突解決工具(如PCGrad、MGDA)並不適用於多目標文本梯度設定。我們透過改變損失函數、梯度與優化器LLM之間共享跨任務資訊的程度,測試了五種文本梯度優化器的分解模式。在10種配置中有6種觀察到優化從未優於初始提示詞。當梯度LLM同時處理多個標準時,梯度特異性下降了59%(從9.0降至3.7)。此外,我們發現將個別任務指令簡單合併為單一提示詞會導致斯皮爾曼等級相關係數下降5.3%。這些結果揭示了兩種可區分的失效模式:優化階段的梯度稀釋與推論階段的指令干擾,兩者共同限制了利用文本反饋進行多目標裁判定制的設計空間。
English
Customizing an LLM judge to a specific task or domain often involves optimizing its prompt across multiple evaluation criteria simultaneously. Textual gradient methods automate this for a single judge criterion, however they produce natural-language critiques, not numerical vectors. Thus, the conflict-resolution toolkit of multi-task learning (PCGrad, MGDA) doesn't apply to the multi-objective textual gradient setting. We test five decomposition modes of textual gradient optimizers by varying how much cross-task information the loss, gradient and optimizer LLMs share. In 6 of 10 configurations, we observe that optimization never improves over the initial prompt. Gradient specificity drops by 59% (from 9.0 to 3.7) when the gradient LLM processes multiple criteria jointly. Separately, we observe that naively combining per-task instructions into a single prompt degrades Spearman's rho by -5.3%. These results identify two separable failure modes: optimization-time gradient dilution and inference-time instruction interference, which together constrain the design space for multi-objective judge customization using textual feedback.