推演隐态:面向大型音语模型的思维链推理之免训练模型调控
Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models
March 15, 2026
作者: Lok-Lam Ieong, Chia-Chien Chen, Chih-Kai Yang, Yu-Han Huang, An-Yu Cheng, Hung-yi Lee
cs.AI
摘要
思维链提示技术已被扩展应用于大型音语模型以激发推理能力,但如何在不进行训练的情况下提升其效能仍具挑战。本研究探索了推理时模型导向这一免训练方法,旨在增强音语模型的推理性能。我们提出了三种基于多源信息的导向策略,并在四个音语模型和四个基准测试平台上进行评估。实验结果表明,相较于思维链提示,这些策略能普遍实现最高达4.4%的准确率提升。值得注意的是,我们发现了跨模态迁移现象:仅通过少量文本样本提取的导向向量即可有效指导语音推理任务,展现出卓越的数据效率。我们还通过超参数敏感性分析验证了这些方法的鲁棒性。本研究证实模型导向是强化音语模型推理能力的实用技术路径。
English
Chain-of-thought (CoT) prompting has been extended to large audio-language models (LALMs) to elicit reasoning, yet enhancing its effectiveness without training remains challenging. We study inference-time model steering as a training-free approach to improve LALM reasoning. We introduce three strategies using diverse information sources and evaluate them across four LALMs and four benchmarks. Results show general accuracy gains up to 4.4% over CoT prompting. Notably, we identify a cross-modal transfer where steering vectors derived from few text samples effectively guide speech-based reasoning, demonstrating high data efficiency. We also examine hyperparameter sensitivity to understand the robustness of these approaches. Our findings position model steering as a practical direction for strengthening LALM reasoning.