ChatPaper.aiChatPaper

FinTrust:金融领域可信度评估的综合基准

FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain

October 17, 2025
作者: Tiansheng Hu, Tongyan Hu, Liuyang Bai, Yilun Zhao, Arman Cohan, Chen Zhao
cs.AI

摘要

近期,大型语言模型(LLMs)在解决金融相关问题上展现了显著潜力。然而,鉴于金融领域的高风险与高利害特性,将LLMs应用于实际金融场景仍面临诸多挑战。本文介绍了一种专门用于评估LLMs在金融应用中可信度的综合基准——FinTrust。该基准基于实际情境,聚焦于广泛的合规性问题,并为可信度评估的每个维度设计了细粒度任务。我们在FinTrust上对十一款LLMs进行了测试,发现如o4-mini等专有模型在安全性等多数任务中表现优异,而DeepSeek-V3等开源模型则在行业公平性等特定领域具有优势。然而,在诸如受托责任对齐和信息披露等挑战性任务上,所有LLMs均表现不足,显示出法律意识方面的显著差距。我们相信,FinTrust将成为金融领域评估LLMs可信度的重要基准。
English
Recent LLMs have demonstrated promising ability in solving finance related problems. However, applying LLMs in real-world finance application remains challenging due to its high risk and high stakes property. This paper introduces FinTrust, a comprehensive benchmark specifically designed for evaluating the trustworthiness of LLMs in finance applications. Our benchmark focuses on a wide range of alignment issues based on practical context and features fine-grained tasks for each dimension of trustworthiness evaluation. We assess eleven LLMs on FinTrust and find that proprietary models like o4-mini outperforms in most tasks such as safety while open-source models like DeepSeek-V3 have advantage in specific areas like industry-level fairness. For challenging task like fiduciary alignment and disclosure, all LLMs fall short, showing a significant gap in legal awareness. We believe that FinTrust can be a valuable benchmark for LLMs' trustworthiness evaluation in finance domain.
PDF52October 20, 2025