按需调用接口：大型语言模型在问答任务中的自适应调用策略

摘要

大语言模型与小语言模型（LMs）的协作范式在性能与成本间实现了有效平衡，但其核心挑战在于准确识别小语言模型产生幻觉时的调用时机。以往优化工作多集中于后处理技术，这些技术与语言模型的推理过程分离，导致计算成本高昂且效果有限。本文提出了一种实用的调用评估指标——AttenHScore，它通过计算小语言模型生成过程中幻觉的累积与传播，持续放大潜在的推理错误。通过动态调整检测阈值，我们实现了对大语言模型更为精准的实时调用。同时，考虑到小语言模型推理能力的局限性，我们采用不确定性感知的知识重组策略，帮助其更好地从不同文本片段中捕捉关键信息。大量实验表明，AttenHScore在提升多个问答数据集上的实时幻觉检测能力方面优于多数基线方法，特别是在处理复杂查询时表现尤为突出。此外，我们的策略无需额外模型训练，并展现出适应多种基于Transformer的语言模型的灵活性。

English

The collaborative paradigm of large and small language models (LMs) effectively balances performance and cost, yet its pivotal challenge lies in precisely pinpointing the moment of invocation when hallucinations arise in small LMs. Previous optimization efforts primarily focused on post-processing techniques, which were separate from the reasoning process of LMs, resulting in high computational costs and limited effectiveness. In this paper, we propose a practical invocation evaluation metric called AttenHScore, which calculates the accumulation and propagation of hallucinations during the generation process of small LMs, continuously amplifying potential reasoning errors. By dynamically adjusting the detection threshold, we achieve more accurate real-time invocation of large LMs. Additionally, considering the limited reasoning capacity of small LMs, we leverage uncertainty-aware knowledge reorganization to assist them better capture critical information from different text chunks. Extensive experiments reveal that our AttenHScore outperforms most baseline in enhancing real-time hallucination detection capabilities across multiple QA datasets, especially when addressing complex queries. Moreover, our strategies eliminate the need for additional model training and display flexibility in adapting to various transformer-based LMs.

按需调用接口：大型语言模型在问答任务中的自适应调用策略

Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering

摘要

Support