隠れ状態への軽微な介入：大規模音声言語モデルにおける連鎖思考推論のための訓練不要なモデル制御

要旨

思考連鎖（CoT）プロンプティングは大規模音声言語モデル（LALM）にも拡張され推論能力の誘発が図られているが、学習なしにその効果を高めることは依然として課題である。本研究では、推論時のモデル制御を学習不要のアプローチとしてLALMの推論改善を検討する。多様な情報源を利用する3つの戦略を導入し、4つのLALMと4つのベンチマークで評価を行う。結果は、CoTプロンプティングに対し最大4.4%の精度向上を示した。特に、少数のテキストサンプルから導出した制御ベクトルが音声ベース推論を効果的に誘導するクロスモーダル転移を確認し、高いデータ効率を実証した。さらに、手法の頑健性を理解するためハイパーパラメータ感度を検証する。本知見は、モデル制御がLALM推論強化の実用的な方向性であることを示す。

English

Chain-of-thought (CoT) prompting has been extended to large audio-language models (LALMs) to elicit reasoning, yet enhancing its effectiveness without training remains challenging. We study inference-time model steering as a training-free approach to improve LALM reasoning. We introduce three strategies using diverse information sources and evaluate them across four LALMs and four benchmarks. Results show general accuracy gains up to 4.4% over CoT prompting. Notably, we identify a cross-modal transfer where steering vectors derived from few text samples effectively guide speech-based reasoning, demonstrating high data efficiency. We also examine hyperparameter sensitivity to understand the robustness of these approaches. Our findings position model steering as a practical direction for strengthening LALM reasoning.

隠れ状態への軽微な介入：大規模音声言語モデルにおける連鎖思考推論のための訓練不要なモデル制御

Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models

要旨

Support