ClaimIQ 在 CheckThat! 2025:比較提示與微調語言模型在驗證數值聲明上的表現
ClaimIQ at CheckThat! 2025: Comparing Prompted and Fine-Tuned Language Models for Verifying Numerical Claims
September 15, 2025
作者: Anirban Saha Anik, Md Fahimul Kabir Chowdhury, Andrew Wyckoff, Sagnik Ray Choudhury
cs.AI
摘要
本論文介紹了我們為CLEF 2025 CheckThat!實驗室任務三所開發的系統,該任務專注於利用檢索到的證據來驗證數值和時間聲明。我們探索了兩種互補的方法:使用指令微調的大型語言模型(LLMs)進行零樣本提示,以及採用參數高效的LoRA進行監督式微調。為了提升證據質量,我們研究了多種選擇策略,包括全文輸入和使用BM25與MiniLM進行前k句過濾。我們表現最佳的模型——基於LoRA微調的LLaMA——在英文驗證集上展現了強勁的性能。然而,測試集上的顯著下降凸顯了泛化能力的挑戰。這些發現強調了證據粒度與模型適應性對於實現穩健的數值事實驗證的重要性。
English
This paper presents our system for Task 3 of the CLEF 2025 CheckThat! Lab,
which focuses on verifying numerical and temporal claims using retrieved
evidence. We explore two complementary approaches: zero-shot prompting with
instruction-tuned large language models (LLMs) and supervised fine-tuning using
parameter-efficient LoRA. To enhance evidence quality, we investigate several
selection strategies, including full-document input and top-k sentence
filtering using BM25 and MiniLM. Our best-performing model LLaMA fine-tuned
with LoRA achieves strong performance on the English validation set. However, a
notable drop in the test set highlights a generalization challenge. These
findings underscore the importance of evidence granularity and model adaptation
for robust numerical fact verification.