金融数据价值解锁：基于蒸馏与难度感知训练的研究

摘要

大型语言模型（LLMs）虽已展现出强大的通用能力，但由于金融领域存在密集的专业术语、严格的数值推理要求以及对事实错误的低容忍度，其实际部署仍面临挑战。我们通过受控实证研究表明，在垂直专业领域中，模型性能主要取决于后训练数据的质量、难度及可验证性特征。本文提出ODA-Fin-SFT-318k数据集（通过多阶段蒸馏与验证构建的高质量思维链监督数据）和ODA-Fin-RL-12k数据集（针对平衡奖励精度与任务多样性的高难度可验证任务精心设计）。采用标准监督微调（SFT）和强化学习（RL）流程，我们发现：高质量思维链蒸馏能为SFT阶段奠定坚实基础，而基于难度与可验证性的采样策略可提升RL的泛化能力。在涵盖通用金融任务、情感分析和数值推理的九项基准测试中，我们的ODA-Fin-RL-8B模型持续超越同规模开源金融LLM的最先进水平。我们公开ODA-Fin-SFT-318k、ODA-Fin-RL-12k数据集及训练模型，以推动以数据为中心的金融AI研究发展。

English

Large Language Models (LLMs) have demonstrated strong general capabilities, yet their deployment in finance remains challenging due to dense domain-specific terminology, stringent numerical reasoning requirements, and low tolerance for factual errors. We conduct a controlled empirical study showing that in specialized vertical domains, performance is largely determined by the quality and difficulty/verifiability profile of post-training data. We introduce ODA-Fin-SFT-318k, constructed via multi-stage distillation and verification to produce high-quality Chain-of-Thought supervision, and ODA-Fin-RL-12k, curated for hard-but-verifiable tasks that balance reward precision and task diversity. Using standard SFT and RL pipelines, we show that high-quality CoT distillation establishes a robust foundation during SFT, while difficulty- and verifiability-aware sampling improves RL generalization. Evaluated on nine benchmarks spanning general financial tasks, sentiment analysis, and numerical reasoning, our ODA-Fin-RL-8B consistently surpasses open-source state-of-the-art (SOTA) financial LLMs of comparable size. We release our ODA-Fin-SFT-318k and ODA-Fin-RL-12k datasets, along with trained models to advance data-centric financial AI research.

金融数据价值解锁：基于蒸馏与难度感知训练的研究

Unlocking Data Value in Finance: A Study on Distillation and Difficulty-Aware Training

摘要

Support