金融数据价值解锁:基于蒸馏与难度感知训练的研究
Unlocking Data Value in Finance: A Study on Distillation and Difficulty-Aware Training
March 7, 2026
作者: Chuxue Cao, Honglin Lin, Zhanping Zhong, Xin Gao, Mengzhang Cai, Conghui He, Sirui Han, Lijun Wu
cs.AI
摘要
大型语言模型(LLMs)虽已展现出强大的通用能力,但由于金融领域存在密集的专业术语、严格的数值推理要求以及对事实错误的低容忍度,其实际部署仍面临挑战。我们通过受控实证研究表明,在垂直专业领域中,模型性能主要取决于后训练数据的质量、难度及可验证性特征。本文提出ODA-Fin-SFT-318k数据集(通过多阶段蒸馏与验证构建的高质量思维链监督数据)和ODA-Fin-RL-12k数据集(针对平衡奖励精度与任务多样性的高难度可验证任务精心设计)。采用标准监督微调(SFT)和强化学习(RL)流程,我们发现:高质量思维链蒸馏能为SFT阶段奠定坚实基础,而基于难度与可验证性的采样策略可提升RL的泛化能力。在涵盖通用金融任务、情感分析和数值推理的九项基准测试中,我们的ODA-Fin-RL-8B模型持续超越同规模开源金融LLM的最先进水平。我们公开ODA-Fin-SFT-318k、ODA-Fin-RL-12k数据集及训练模型,以推动以数据为中心的金融AI研究发展。
English
Large Language Models (LLMs) have demonstrated strong general capabilities, yet their deployment in finance remains challenging due to dense domain-specific terminology, stringent numerical reasoning requirements, and low tolerance for factual errors. We conduct a controlled empirical study showing that in specialized vertical domains, performance is largely determined by the quality and difficulty/verifiability profile of post-training data. We introduce ODA-Fin-SFT-318k, constructed via multi-stage distillation and verification to produce high-quality Chain-of-Thought supervision, and ODA-Fin-RL-12k, curated for hard-but-verifiable tasks that balance reward precision and task diversity. Using standard SFT and RL pipelines, we show that high-quality CoT distillation establishes a robust foundation during SFT, while difficulty- and verifiability-aware sampling improves RL generalization. Evaluated on nine benchmarks spanning general financial tasks, sentiment analysis, and numerical reasoning, our ODA-Fin-RL-8B consistently surpasses open-source state-of-the-art (SOTA) financial LLMs of comparable size. We release our ODA-Fin-SFT-318k and ODA-Fin-RL-12k datasets, along with trained models to advance data-centric financial AI research.