按需调用接口:大型语言模型在问答任务中的自适应调用策略
Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering
May 5, 2025
作者: Jihao Zhao, Chunlai Zhou, Biao Qin
cs.AI
摘要
大语言模型与小语言模型(LMs)的协作范式在性能与成本间实现了有效平衡,但其核心挑战在于准确识别小语言模型产生幻觉时的调用时机。以往优化工作多集中于后处理技术,这些技术与语言模型的推理过程分离,导致计算成本高昂且效果有限。本文提出了一种实用的调用评估指标——AttenHScore,它通过计算小语言模型生成过程中幻觉的累积与传播,持续放大潜在的推理错误。通过动态调整检测阈值,我们实现了对大语言模型更为精准的实时调用。同时,考虑到小语言模型推理能力的局限性,我们采用不确定性感知的知识重组策略,帮助其更好地从不同文本片段中捕捉关键信息。大量实验表明,AttenHScore在提升多个问答数据集上的实时幻觉检测能力方面优于多数基线方法,特别是在处理复杂查询时表现尤为突出。此外,我们的策略无需额外模型训练,并展现出适应多种基于Transformer的语言模型的灵活性。
English
The collaborative paradigm of large and small language models (LMs)
effectively balances performance and cost, yet its pivotal challenge lies in
precisely pinpointing the moment of invocation when hallucinations arise in
small LMs. Previous optimization efforts primarily focused on post-processing
techniques, which were separate from the reasoning process of LMs, resulting in
high computational costs and limited effectiveness. In this paper, we propose a
practical invocation evaluation metric called AttenHScore, which calculates the
accumulation and propagation of hallucinations during the generation process of
small LMs, continuously amplifying potential reasoning errors. By dynamically
adjusting the detection threshold, we achieve more accurate real-time
invocation of large LMs. Additionally, considering the limited reasoning
capacity of small LMs, we leverage uncertainty-aware knowledge reorganization
to assist them better capture critical information from different text chunks.
Extensive experiments reveal that our AttenHScore outperforms most baseline in
enhancing real-time hallucination detection capabilities across multiple QA
datasets, especially when addressing complex queries. Moreover, our strategies
eliminate the need for additional model training and display flexibility in
adapting to various transformer-based LMs.Summary
AI-Generated Summary