SemViQA:面向越南語資訊事實查核的語義問答系統
SemViQA: A Semantic Question Answering System for Vietnamese Information Fact-Checking
March 2, 2025
作者: Nam V. Nguyen, Dien X. Tran, Thanh T. Tran, Anh T. Hoang, Tai V. Duong, Di T. Le, Phuc-Lu Le
cs.AI
摘要
大型語言模型(LLMs)如GPT和Gemini的興起,加劇了錯誤信息的傳播,這要求我們開發出強大的事實核查解決方案,特別是針對像越南語這樣的低資源語言。現有方法在處理語義模糊、同音異義詞和複雜語言結構時往往力不從心,常常在準確性和效率之間做出妥協。我們引入了SemViQA,這是一個新穎的越南語事實核查框架,整合了基於語義的證據檢索(SER)和兩步裁決分類(TVC)。我們的方法在精確度和速度之間取得了平衡,在ISE-DSC01數據集上達到了78.97%的嚴格準確率,在ViWikiFC數據集上達到了80.82%,並在UIT數據科學挑戰賽中奪得第一名。此外,SemViQA Faster將推理速度提升了7倍,同時保持了競爭力的準確性。SemViQA為越南語事實核查設立了新的基準,推動了對抗錯誤信息的進程。源代碼可在此處獲取:https://github.com/DAVID-NGUYEN-S16/SemViQA。
English
The rise of misinformation, exacerbated by Large Language Models (LLMs) like
GPT and Gemini, demands robust fact-checking solutions, especially for
low-resource languages like Vietnamese. Existing methods struggle with semantic
ambiguity, homonyms, and complex linguistic structures, often trading accuracy
for efficiency. We introduce SemViQA, a novel Vietnamese fact-checking
framework integrating Semantic-based Evidence Retrieval (SER) and Two-step
Verdict Classification (TVC). Our approach balances precision and speed,
achieving state-of-the-art results with 78.97\% strict accuracy on ISE-DSC01
and 80.82\% on ViWikiFC, securing 1st place in the UIT Data Science Challenge.
Additionally, SemViQA Faster improves inference speed 7x while maintaining
competitive accuracy. SemViQA sets a new benchmark for Vietnamese fact
verification, advancing the fight against misinformation. The source code is
available at: https://github.com/DAVID-NGUYEN-S16/SemViQA.Summary
AI-Generated Summary