推演隱藏狀態：面向大型音頻語言模型的思維鏈推理之免訓練模型引導方法

摘要

思维链提示技术已被扩展应用于大型音语模型以激发推理能力，但如何在无需训练的情况下提升其效能仍具挑战。本研究探索了一种免训练的推理阶段模型引导方法，旨在增强音语模型的推理性能。我们提出了三种利用多源信息的引导策略，并在四个音语模型和四个基准测试平台上进行评估。实验结果表明，相较于思维链提示，该方法普遍能实现最高达4.4%的准确率提升。值得注意的是，我们发现了跨模态迁移现象：仅需少量文本样本生成的引导向量即可有效指导语音推理任务，展现出极高的数据效率。通过超参数敏感性分析，我们进一步验证了这些方法的鲁棒性。本研究确立了模型引导作为强化音语模型推理能力的实用发展方向。

English

Chain-of-thought (CoT) prompting has been extended to large audio-language models (LALMs) to elicit reasoning, yet enhancing its effectiveness without training remains challenging. We study inference-time model steering as a training-free approach to improve LALM reasoning. We introduce three strategies using diverse information sources and evaluate them across four LALMs and four benchmarks. Results show general accuracy gains up to 4.4% over CoT prompting. Notably, we identify a cross-modal transfer where steering vectors derived from few text samples effectively guide speech-based reasoning, demonstrating high data efficiency. We also examine hyperparameter sensitivity to understand the robustness of these approaches. Our findings position model steering as a practical direction for strengthening LALM reasoning.

推演隱藏狀態：面向大型音頻語言模型的思維鏈推理之免訓練模型引導方法

Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models

摘要

Support