ChatPaper.aiChatPaper

CheckThat! 2025中的ClaimIQ:对比提示式与微调语言模型在数值声明验证中的表现

ClaimIQ at CheckThat! 2025: Comparing Prompted and Fine-Tuned Language Models for Verifying Numerical Claims

September 15, 2025
作者: Anirban Saha Anik, Md Fahimul Kabir Chowdhury, Andrew Wyckoff, Sagnik Ray Choudhury
cs.AI

摘要

本文介绍了我们为CLEF 2025 CheckThat!实验室任务三开发的系统,该系统专注于利用检索到的证据验证数值和时间声明。我们探索了两种互补的方法:基于指令调优的大型语言模型(LLMs)的零样本提示,以及使用参数高效的LoRA进行监督微调。为了提高证据质量,我们研究了多种选择策略,包括全文输入和使用BM25和MiniLM进行top-k句子过滤。我们表现最佳的模型——采用LoRA微调的LLaMA,在英语验证集上取得了强劲的性能。然而,测试集上的显著下降凸显了泛化挑战。这些发现强调了证据粒度与模型适应对于稳健数值事实验证的重要性。
English
This paper presents our system for Task 3 of the CLEF 2025 CheckThat! Lab, which focuses on verifying numerical and temporal claims using retrieved evidence. We explore two complementary approaches: zero-shot prompting with instruction-tuned large language models (LLMs) and supervised fine-tuning using parameter-efficient LoRA. To enhance evidence quality, we investigate several selection strategies, including full-document input and top-k sentence filtering using BM25 and MiniLM. Our best-performing model LLaMA fine-tuned with LoRA achieves strong performance on the English validation set. However, a notable drop in the test set highlights a generalization challenge. These findings underscore the importance of evidence granularity and model adaptation for robust numerical fact verification.
PDF12September 16, 2025