言葉と重み：共適応によるマルチターン対話の効率化

要旨

多ターン対話におけるテスト時ポリシー適応（T2PAM）は、推論時に大規模言語モデル（LLM）を動的なユーザー要求に適合させる上で不可欠である。しかし、既存のパラダイムではテスト時適応を単一軸の問題として扱うことが一般的であり、純粋に指示の洗練（プロンプトエンジニアリング）に終始するか、あるいは重みの調整（テスト時訓練）のみに焦点を当て、対話の失敗が曖昧性と能力不足の複合的な要因に起因することを見落としている。我々は、これら二つの最適化経路が単に加算的なものではなく相乗効果を持つと主張する。すなわち、意味的明確さは効果的なパラメータ更新の前提条件として機能するのである。この目的のために、我々は対話を「単語と重み」という異種混合空間における結合最適化問題として再定式化するフレームワークROSA2を提案する。誤差信号を数学的に分解することで、ROSA2はテキスト勾配を利用して意図の曖昧性を修正し、パラメータ更新によって能力ギャップを埋める。理論的には、この共適応が収束に必要なパラメータ変位を厳密に減少させることを証明する。実験では、ROSA2がMATHデータセットにおいて既存の最先端手法を30%上回り、対話ターン数を40%削減し、文脈の洗練がパラメータ更新の真の潜在能力を解放することを実証した。

English

Test-time policy adaptation for multi-turn interactions (T2PAM) is essential for aligning Large Language Models (LLMs) with dynamic user needs during inference time. However, existing paradigms commonly treat test-time adaptation as a single-axis problem, either purely refining instructions (Prompt Engineering) or only adjusting weights (Test-Time Training), ignoring that interaction failures stem from a coupled mix of ambiguity and incapacity. We argue that these two optimization paths are not merely additive but synergistic: semantic clarity acts as a pre-conditioner for effective parameter updates. To this end, we propose ROSA2, a framework that reformulates interaction as a joint optimization problem over the heterogeneous space of Words and Weights. By mathematically decomposing the error signal, ROSA2 utilizes textual gradients to rectify intent ambiguity and parameter updates to bridge capability gaps. Theoretically, we prove that this co-adaptation strictly reduces the required parameter shift for convergence. Empirically, ROSA2 outperforms state-of-the-art baselines by 30% on MATH while reducing interaction turns by 40%, demonstrating that refining the context unlocks the true potential of parameter updates.

言葉と重み：共適応によるマルチターン対話の効率化

Words & Weights: Streamlining Multi-Turn Interactions via Co-Adaptation

要旨

Support