金融におけるデータ価値の解放：蒸留と難易度認識トレーニングに関する研究

要旨

大規模言語モデル（LLM）は強力な汎用能力を示しているが、金融分野への適用は、専門用語の密集、厳密な数値推論要件、事実誤りへの許容度の低さから依然として困難である。我々は制御された実証研究を通じ、専門的な垂直領域では性能が学習後データの品質と難易度／検証可能性の特性に大きく依存することを示す。高品質な連鎖的思考（Chain-of-Thought）教師データを構築するため、多段階の蒸留と検証を経て作成したODA-Fin-SFT-318kと、報酬精度と課題多様性のバランスが取れた検証可能な難易度の高い課題向けに精選したODA-Fin-RL-12kを提案する。標準的なSFTとRLパイプラインを用いた実験により、高品質なCoT蒸留がSFT段階で堅牢な基盤を構築し、難易度と検証可能性を考慮したサンプリングがRLの汎化性能を向上させることを実証する。一般的な金融タスク、感情分析、数値推論を含む9種類のベンチマークで評価した結果、我々のODA-Fin-RL-8Bモデルは同等規模のオープンソース金融LLMの最先端性能を一貫して上回った。データ中心の金融AI研究の推進に向け、ODA-Fin-SFT-318kとODA-Fin-RL-12kデータセット、および学習済みモデルを公開する。

English

Large Language Models (LLMs) have demonstrated strong general capabilities, yet their deployment in finance remains challenging due to dense domain-specific terminology, stringent numerical reasoning requirements, and low tolerance for factual errors. We conduct a controlled empirical study showing that in specialized vertical domains, performance is largely determined by the quality and difficulty/verifiability profile of post-training data. We introduce ODA-Fin-SFT-318k, constructed via multi-stage distillation and verification to produce high-quality Chain-of-Thought supervision, and ODA-Fin-RL-12k, curated for hard-but-verifiable tasks that balance reward precision and task diversity. Using standard SFT and RL pipelines, we show that high-quality CoT distillation establishes a robust foundation during SFT, while difficulty- and verifiability-aware sampling improves RL generalization. Evaluated on nine benchmarks spanning general financial tasks, sentiment analysis, and numerical reasoning, our ODA-Fin-RL-8B consistently surpasses open-source state-of-the-art (SOTA) financial LLMs of comparable size. We release our ODA-Fin-SFT-318k and ODA-Fin-RL-12k datasets, along with trained models to advance data-centric financial AI research.

金融におけるデータ価値の解放：蒸留と難易度認識トレーニングに関する研究

Unlocking Data Value in Finance: A Study on Distillation and Difficulty-Aware Training

要旨

Support