ChatPaper.aiChatPaper

当梯度冲突时:面向大语言模型评判器的多目标提示优化失败模式

When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

May 25, 2026
作者: Parth Darshan, Abhishek Divekar
cs.AI

摘要

针对特定任务或领域定制LLM(大语言模型)判据时,通常需要同时优化多个评估标准下的提示词。文本梯度方法虽能自动优化单一判据标准,但其生成的是自然语言形式的反馈而非数值向量。因此,多任务学习中的冲突消解工具(如PCGrad、MGDA)无法适用于多目标文本梯度场景。我们通过控制损失函数、梯度生成器及优化器LLM共享跨任务信息的程度,测试了文本梯度优化器的五种分解模式。在10种配置中的6种里,优化后的提示词从未优于初始版本。当梯度生成器LLM同时处理多个判据时,梯度特异性下降59%(从9.0降至3.7)。此外,将逐任务指令简单合并为单一提示词会导致斯皮尔曼相关系数下降5.3%。这些结果揭示了两种可分离的失效模式:优化阶段的梯度稀释与推理阶段的指令干扰,共同限制了基于文本反馈的多目标判据定制的设计空间。
English
Customizing an LLM judge to a specific task or domain often involves optimizing its prompt across multiple evaluation criteria simultaneously. Textual gradient methods automate this for a single judge criterion, however they produce natural-language critiques, not numerical vectors. Thus, the conflict-resolution toolkit of multi-task learning (PCGrad, MGDA) doesn't apply to the multi-objective textual gradient setting. We test five decomposition modes of textual gradient optimizers by varying how much cross-task information the loss, gradient and optimizer LLMs share. In 6 of 10 configurations, we observe that optimization never improves over the initial prompt. Gradient specificity drops by 59% (from 9.0 to 3.7) when the gradient LLM processes multiple criteria jointly. Separately, we observe that naively combining per-task instructions into a single prompt degrades Spearman's rho by -5.3%. These results identify two separable failure modes: optimization-time gradient dilution and inference-time instruction interference, which together constrain the design space for multi-objective judge customization using textual feedback.