CheckThat! 2025中的ClaimIQ:对比提示式与微调语言模型在数值声明验证中的表现
ClaimIQ at CheckThat! 2025: Comparing Prompted and Fine-Tuned Language Models for Verifying Numerical Claims
September 15, 2025
作者: Anirban Saha Anik, Md Fahimul Kabir Chowdhury, Andrew Wyckoff, Sagnik Ray Choudhury
cs.AI
摘要
本文介绍了我们为CLEF 2025 CheckThat!实验室任务三开发的系统,该系统专注于利用检索到的证据验证数值和时间声明。我们探索了两种互补的方法:基于指令调优的大型语言模型(LLMs)的零样本提示,以及使用参数高效的LoRA进行监督微调。为了提高证据质量,我们研究了多种选择策略,包括全文输入和使用BM25和MiniLM进行top-k句子过滤。我们表现最佳的模型——采用LoRA微调的LLaMA,在英语验证集上取得了强劲的性能。然而,测试集上的显著下降凸显了泛化挑战。这些发现强调了证据粒度与模型适应对于稳健数值事实验证的重要性。
English
This paper presents our system for Task 3 of the CLEF 2025 CheckThat! Lab,
which focuses on verifying numerical and temporal claims using retrieved
evidence. We explore two complementary approaches: zero-shot prompting with
instruction-tuned large language models (LLMs) and supervised fine-tuning using
parameter-efficient LoRA. To enhance evidence quality, we investigate several
selection strategies, including full-document input and top-k sentence
filtering using BM25 and MiniLM. Our best-performing model LLaMA fine-tuned
with LoRA achieves strong performance on the English validation set. However, a
notable drop in the test set highlights a generalization challenge. These
findings underscore the importance of evidence granularity and model adaptation
for robust numerical fact verification.