ChatPaper.aiChatPaper

Agentar-Fin-R1:通过领域专长、训练效率与高级推理提升金融智能

Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning

July 22, 2025
作者: Yanjun Zheng, Xiyang Du, Longfei Liao, Xiaoke Zhao, Zhaowen Zhou, Jingze Song, Bo Zhang, Jiawei Liu, Xiang Qi, Zhe Li, Zhiqiang Zhang, Wei Wang, Peng Zhang
cs.AI

摘要

大型语言模型(LLMs)在金融应用领域展现出显著潜力;然而,现有模型在面对需要复杂推理能力、严格可信度标准及高效适应领域特定需求的场景时,常显露出局限性。我们推出了基于Qwen3基础模型专门设计的Agentar-Fin-R1系列金融大语言模型(8B与32B参数),旨在增强金融应用中的推理能力、可靠性及领域专精性。我们的优化策略融合了一套高质量、系统化的金融任务标签体系与一个全面的多层次可信保障框架,该框架囊括了高质量可信知识工程、多智能体可信数据合成以及严格的数据验证治理。通过标签引导的自动化难度感知优化、两阶段训练管道及动态归因系统,我们实现了训练效率的显著提升。我们的模型在包括Fineva、FinEval和FinanceIQ在内的主流金融基准测试,以及如MATH-500和GPQA-diamond等通用推理数据集上接受了全面评估。为深入评估实际部署能力,我们创新性地提出了Finova评估基准,专注于智能体级别的金融推理与合规性验证。实验结果表明,Agentar-Fin-R1不仅在金融任务上达到了业界领先水平,还展现了卓越的通用推理能力,验证了其作为高风险金融应用可信解决方案的有效性。Finova基准测试平台可访问https://github.com/antgroup/Finova。
English
Large Language Models (LLMs) exhibit considerable promise in financial applications; however, prevailing models frequently demonstrate limitations when confronted with scenarios that necessitate sophisticated reasoning capabilities, stringent trustworthiness criteria, and efficient adaptation to domain-specific requirements. We introduce the Agentar-Fin-R1 series of financial large language models (8B and 32B parameters), specifically engineered based on the Qwen3 foundation model to enhance reasoning capabilities, reliability, and domain specialization for financial applications. Our optimization approach integrates a high-quality, systematic financial task label system with a comprehensive multi-layered trustworthiness assurance framework. This framework encompasses high-quality trustworthy knowledge engineering, multi-agent trustworthy data synthesis, and rigorous data validation governance. Through label-guided automated difficulty-aware optimization, tow-stage training pipeline, and dynamic attribution systems, we achieve substantial improvements in training efficiency. Our models undergo comprehensive evaluation on mainstream financial benchmarks including Fineva, FinEval, and FinanceIQ, as well as general reasoning datasets such as MATH-500 and GPQA-diamond. To thoroughly assess real-world deployment capabilities, we innovatively propose the Finova evaluation benchmark, which focuses on agent-level financial reasoning and compliance verification. Experimental results demonstrate that Agentar-Fin-R1 not only achieves state-of-the-art performance on financial tasks but also exhibits exceptional general reasoning capabilities, validating its effectiveness as a trustworthy solution for high-stakes financial applications. The Finova bench is available at https://github.com/antgroup/Finova.
PDF63July 25, 2025