仅在需要時調用介面：大型語言模型在問答中的自適應調用

摘要

大型與小型語言模型（LMs）的協作模式有效地平衡了性能與成本，但其關鍵挑戰在於精確定位小型LMs產生幻覺時的調用時機。以往的優化工作主要集中於後處理技術，這些技術與LMs的推理過程分離，導致計算成本高且效果有限。本文提出了一種實用的調用評估指標——AttenHScore，該指標計算小型LMs生成過程中幻覺的累積與傳播，持續放大潛在的推理錯誤。通過動態調整檢測閾值，我們實現了更精確的大型LMs實時調用。此外，考慮到小型LMs的推理能力有限，我們利用不確定性感知的知識重組來幫助它們更好地從不同文本片段中捕捉關鍵信息。大量實驗表明，我們的AttenHScore在多個問答數據集上增強實時幻覺檢測能力方面優於大多數基線，特別是在處理複雜查詢時。此外，我們的策略無需額外的模型訓練，並展現出適應各種基於Transformer的LMs的靈活性。

English

The collaborative paradigm of large and small language models (LMs) effectively balances performance and cost, yet its pivotal challenge lies in precisely pinpointing the moment of invocation when hallucinations arise in small LMs. Previous optimization efforts primarily focused on post-processing techniques, which were separate from the reasoning process of LMs, resulting in high computational costs and limited effectiveness. In this paper, we propose a practical invocation evaluation metric called AttenHScore, which calculates the accumulation and propagation of hallucinations during the generation process of small LMs, continuously amplifying potential reasoning errors. By dynamically adjusting the detection threshold, we achieve more accurate real-time invocation of large LMs. Additionally, considering the limited reasoning capacity of small LMs, we leverage uncertainty-aware knowledge reorganization to assist them better capture critical information from different text chunks. Extensive experiments reveal that our AttenHScore outperforms most baseline in enhancing real-time hallucination detection capabilities across multiple QA datasets, especially when addressing complex queries. Moreover, our strategies eliminate the need for additional model training and display flexibility in adapting to various transformer-based LMs.

仅在需要時調用介面：大型語言模型在問答中的自適應調用

Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering

摘要

Support