DianJin-R1：評估與增強大型語言模型中的金融推理能力

摘要

在金融領域，有效的推理仍然是大型語言模型（LLMs）面臨的核心挑戰，這些任務通常需要領域特定的知識、精確的數值計算以及嚴格遵守合規規則。我們提出了DianJin-R1，這是一個推理增強框架，旨在通過推理增強的監督和強化學習來應對這些挑戰。我們方法的核心是DianJin-R1-Data，這是一個高質量數據集，構建自CFLUE、FinQA和一個專有的合規語料庫（中國合規檢查，CCC），結合了多樣化的金融推理場景與經過驗證的註釋。我們的模型，DianJin-R1-7B和DianJin-R1-32B，是從Qwen2.5-7B-Instruct和Qwen2.5-32B-Instruct微調而來，使用了一種結構化格式，生成推理步驟和最終答案。為了進一步提升推理質量，我們應用了群組相對策略優化（GRPO），這是一種強化學習方法，結合了雙重獎勵信號：一個鼓勵結構化輸出，另一個獎勵答案的正確性。我們在五個基準上評估了我們的模型：三個金融數據集（CFLUE、FinQA和CCC）和兩個通用推理基準（MATH-500和GPQA-Diamond）。實驗結果顯示，DianJin-R1模型在複雜金融任務上持續超越其非推理對應模型。此外，在現實世界的CCC數據集上，我們的單次調用推理模型匹配甚至超越了需要顯著更多計算成本的多代理系統的性能。這些發現展示了DianJin-R1通過結構化監督和獎勵對齊學習來增強金融推理的有效性，為現實世界應用提供了一個可擴展且實用的解決方案。

English

Effective reasoning remains a core challenge for large language models (LLMs) in the financial domain, where tasks often require domain-specific knowledge, precise numerical calculations, and strict adherence to compliance rules. We propose DianJin-R1, a reasoning-enhanced framework designed to address these challenges through reasoning-augmented supervision and reinforcement learning. Central to our approach is DianJin-R1-Data, a high-quality dataset constructed from CFLUE, FinQA, and a proprietary compliance corpus (Chinese Compliance Check, CCC), combining diverse financial reasoning scenarios with verified annotations. Our models, DianJin-R1-7B and DianJin-R1-32B, are fine-tuned from Qwen2.5-7B-Instruct and Qwen2.5-32B-Instruct using a structured format that generates both reasoning steps and final answers. To further refine reasoning quality, we apply Group Relative Policy Optimization (GRPO), a reinforcement learning method that incorporates dual reward signals: one encouraging structured outputs and another rewarding answer correctness. We evaluate our models on five benchmarks: three financial datasets (CFLUE, FinQA, and CCC) and two general reasoning benchmarks (MATH-500 and GPQA-Diamond). Experimental results show that DianJin-R1 models consistently outperform their non-reasoning counterparts, especially on complex financial tasks. Moreover, on the real-world CCC dataset, our single-call reasoning models match or even surpass the performance of multi-agent systems that require significantly more computational cost. These findings demonstrate the effectiveness of DianJin-R1 in enhancing financial reasoning through structured supervision and reward-aligned learning, offering a scalable and practical solution for real-world applications.

DianJin-R1：評估與增強大型語言模型中的金融推理能力

DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models

摘要

Support