同一主张,相异判定:多语言金融虚假信息检测中情境诱导偏见的基准研究
Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection
January 8, 2026
作者: Zhiwei Liu, Yupen Cao, Yuechen Jiang, Mohsinul Kabir, Polydoros Giannouris, Chen Xu, Ziyang Xu, Tianlei Zhu, Tariquzzaman Faisal, Triantafillos Papadopoulos, Yan Wang, Lingfei Qian, Xueqing Peng, Zhuohan Xie, Ye Yuan, Saeed Almheiri, Abdulrazzaq Alnajjar, Mingbin Chen, Harry Stuart, Paul Thompson, Prayag Tiwari, Alejandro Lopez-Lira, Xue Liu, Jimin Huang, Sophia Ananiadou
cs.AI
摘要
大型语言模型(LLMs)已在金融领域的多个方面得到广泛应用。由于这类模型的训练数据主要来源于人类撰写的语料库,它们可能继承一系列人类认知偏差。行为偏差会导致决策过程中的不稳定性与不确定性,尤其在处理金融信息时更为明显。然而现有关于LLM偏差的研究主要集中于直接提问或简化的通用场景,对复杂现实金融环境及高风险、语境敏感的多语言金融虚假信息检测任务(\mfmd)的考量较为有限。本研究提出\mfmdscen基准测试框架,旨在系统评估LLMs在不同经济情境下执行\mfmd任务时表现出的行为偏差。通过与金融专家合作,我们构建了三类复杂金融场景:(i)基于角色与人格特质的场景;(ii)基于角色与地域特征的场景;(iii)融合族群与宗教信仰的角色型场景。此外,我们开发了涵盖英语、汉语、希腊语和孟加拉语的多语言金融虚假信息数据集。通过将上述场景与虚假信息声明相结合,\mfscen实现了对22个主流LLMs的系统性评估。研究结果表明,无论是商业模型还是开源模型均存在显著的行为偏差。本项目资源详见https://github.com/lzw108/FMD。
English
Large language models (LLMs) have been widely applied across various domains of finance. Since their training data are largely derived from human-authored corpora, LLMs may inherit a range of human biases. Behavioral biases can lead to instability and uncertainty in decision-making, particularly when processing financial information. However, existing research on LLM bias has mainly focused on direct questioning or simplified, general-purpose settings, with limited consideration of the complex real-world financial environments and high-risk, context-sensitive, multilingual financial misinformation detection tasks (\mfmd). In this work, we propose \mfmdscen, a comprehensive benchmark for evaluating behavioral biases of LLMs in \mfmd across diverse economic scenarios. In collaboration with financial experts, we construct three types of complex financial scenarios: (i) role- and personality-based, (ii) role- and region-based, and (iii) role-based scenarios incorporating ethnicity and religious beliefs. We further develop a multilingual financial misinformation dataset covering English, Chinese, Greek, and Bengali. By integrating these scenarios with misinformation claims, \mfmdscen enables a systematic evaluation of 22 mainstream LLMs. Our findings reveal that pronounced behavioral biases persist across both commercial and open-source models. This project will be available at https://github.com/lzw108/FMD.