詞語與權重：透過協同適應簡化多輪對話互動

摘要

針對多輪互動的測試時策略適應（T2PAM）對於在推理階段使大型語言模型（LLM）與動態使用者需求保持一致至關重要。然而現有範式普遍將測試時適應視為單軸問題，要麼純粹優化指令（提示工程），要麼僅調整權重（測試時訓練），忽略了互動失敗實則源於模糊性與能力不足的耦合作用。我們主張這兩條優化路徑並非簡單疊加而是相輔相成：語義清晰度扮演著有效參數更新的前置調節器。為此，我們提出ROSA2框架，將互動重新定義為在「詞語」與「權重」異構空間上的聯合優化問題。通過數學分解誤差信號，ROSA2利用文本梯度校正意圖模糊性，並通過參數更新彌補能力缺口。理論上我們證明這種協同適應能嚴格收斂所需的參數偏移量。實證中ROSA2在MATH數據集上以30%優勢超越現有頂尖基準，同時將互動輪次減少40%，證實上下文優化能真正釋放參數更新的潛能。

English

Test-time policy adaptation for multi-turn interactions (T2PAM) is essential for aligning Large Language Models (LLMs) with dynamic user needs during inference time. However, existing paradigms commonly treat test-time adaptation as a single-axis problem, either purely refining instructions (Prompt Engineering) or only adjusting weights (Test-Time Training), ignoring that interaction failures stem from a coupled mix of ambiguity and incapacity. We argue that these two optimization paths are not merely additive but synergistic: semantic clarity acts as a pre-conditioner for effective parameter updates. To this end, we propose ROSA2, a framework that reformulates interaction as a joint optimization problem over the heterogeneous space of Words and Weights. By mathematically decomposing the error signal, ROSA2 utilizes textual gradients to rectify intent ambiguity and parameter updates to bridge capability gaps. Theoretically, we prove that this co-adaptation strictly reduces the required parameter shift for convergence. Empirically, ROSA2 outperforms state-of-the-art baselines by 30% on MATH while reducing interaction turns by 40%, demonstrating that refining the context unlocks the true potential of parameter updates.

詞語與權重：透過協同適應簡化多輪對話互動

Words & Weights: Streamlining Multi-Turn Interactions via Co-Adaptation

摘要

Support