Agentar-Fin-R1：通过领域专长、训练效率与高级推理能力提升金融智能

摘要

大型语言模型（LLMs）在金融应用领域展现出巨大潜力；然而，现有模型在面对需要高级推理能力、严格可信度要求及高效适应领域特定需求的场景时，往往表现出局限性。我们推出了Agentar-Fin-R1系列金融大语言模型（8B和32B参数），该系列基于Qwen3基础模型专门设计，旨在增强金融应用中的推理能力、可靠性和领域专长。我们的优化方法融合了一套高质量、系统化的金融任务标签体系与一个全面的多层次可信度保障框架。此框架涵盖了高质量可信知识工程、多智能体可信数据合成以及严格的数据验证治理。通过标签引导的自动化难度感知优化、两阶段训练管道及动态归因系统，我们显著提升了训练效率。我们的模型在主流金融基准测试如Fineva、FinEval和FinanceIQ，以及通用推理数据集如MATH-500和GPQA-diamond上接受了全面评估。为深入评估实际部署能力，我们创新性地提出了Finova评估基准，专注于智能体级别的金融推理与合规性验证。实验结果表明，Agentar-Fin-R1不仅在金融任务上达到了业界领先水平，还展现了卓越的通用推理能力，验证了其作为高风险金融应用可信解决方案的有效性。Finova基准测试平台可在https://github.com/antgroup/Finova获取。

English

Large Language Models (LLMs) exhibit considerable promise in financial applications; however, prevailing models frequently demonstrate limitations when confronted with scenarios that necessitate sophisticated reasoning capabilities, stringent trustworthiness criteria, and efficient adaptation to domain-specific requirements. We introduce the Agentar-Fin-R1 series of financial large language models (8B and 32B parameters), specifically engineered based on the Qwen3 foundation model to enhance reasoning capabilities, reliability, and domain specialization for financial applications. Our optimization approach integrates a high-quality, systematic financial task label system with a comprehensive multi-layered trustworthiness assurance framework. This framework encompasses high-quality trustworthy knowledge engineering, multi-agent trustworthy data synthesis, and rigorous data validation governance. Through label-guided automated difficulty-aware optimization, tow-stage training pipeline, and dynamic attribution systems, we achieve substantial improvements in training efficiency. Our models undergo comprehensive evaluation on mainstream financial benchmarks including Fineva, FinEval, and FinanceIQ, as well as general reasoning datasets such as MATH-500 and GPQA-diamond. To thoroughly assess real-world deployment capabilities, we innovatively propose the Finova evaluation benchmark, which focuses on agent-level financial reasoning and compliance verification. Experimental results demonstrate that Agentar-Fin-R1 not only achieves state-of-the-art performance on financial tasks but also exhibits exceptional general reasoning capabilities, validating its effectiveness as a trustworthy solution for high-stakes financial applications. The Finova bench is available at https://github.com/antgroup/Finova.