词语与权重：通过协同适应优化多轮交互

摘要

面向多轮交互的测试时策略适配（T2PAM）对于在推理阶段使大语言模型（LLMs）与动态用户需求保持对齐至关重要。然而，现有范式通常将测试时适配视为单维度问题，要么单纯优化指令（提示工程），要么仅调整权重（测试时训练），忽略了交互失败源于语义模糊性与能力不足的耦合作用。我们认为这两条优化路径并非简单叠加而是协同增效：语义清晰度是有效参数更新的前置调节器。为此，我们提出ROSA2框架，将交互重新定义为在词语与权重的异构空间上的联合优化问题。通过数学分解误差信号，ROSA2利用文本梯度修正意图模糊性，通过参数更新弥补能力差距。理论上，我们证明这种协同适配能严格降低收敛所需的参数偏移量。实验表明，ROSA2在MATH数据集上以30%的优势超越现有最优基线，同时将交互轮次减少40%，证实了上下文优化能真正释放参数更新的潜力。

English

Test-time policy adaptation for multi-turn interactions (T2PAM) is essential for aligning Large Language Models (LLMs) with dynamic user needs during inference time. However, existing paradigms commonly treat test-time adaptation as a single-axis problem, either purely refining instructions (Prompt Engineering) or only adjusting weights (Test-Time Training), ignoring that interaction failures stem from a coupled mix of ambiguity and incapacity. We argue that these two optimization paths are not merely additive but synergistic: semantic clarity acts as a pre-conditioner for effective parameter updates. To this end, we propose ROSA2, a framework that reformulates interaction as a joint optimization problem over the heterogeneous space of Words and Weights. By mathematically decomposing the error signal, ROSA2 utilizes textual gradients to rectify intent ambiguity and parameter updates to bridge capability gaps. Theoretically, we prove that this co-adaptation strictly reduces the required parameter shift for convergence. Empirically, ROSA2 outperforms state-of-the-art baselines by 30% on MATH while reducing interaction turns by 40%, demonstrating that refining the context unlocks the true potential of parameter updates.

词语与权重：通过协同适应优化多轮交互

Words & Weights: Streamlining Multi-Turn Interactions via Co-Adaptation

摘要

Support