Het Ontsluiten van Datawaarde in de Financiële Sector: Een Onderzoek naar Distillatie en Moeilijkheidsbewuste Training

Samenvatting

Grote Taalmodellen (LLM's) hebben sterke algemene capaciteiten getoond, maar hun inzet in de financiële sector blijft uitdagend vanwege de dichte domeinspecifieke terminologie, strenge eisen voor numeriek redeneren en een lage tolerantie voor feitelijke fouten. Wij voeren een gecontroleerde empirische studie uit waaruit blijkt dat in gespecialiseerde verticale domeinen de prestaties grotendeels worden bepaald door de kwaliteit en het moeilijkheids-/verifieerbaarheidsprofiel van post-trainingdata. Wij introduceren ODA-Fin-SFT-318k, geconstrueerd via multi-stapsdistillatie en -verificatie om hoogwaardige Chain-of-Thought-supervisie te produceren, en ODA-Fin-RL-12k, samengesteld voor moeilijke maar verifieerbare taken die beloningsprecisie en taakdiversiteit in evenwicht brengen. Met behulp van standaard SFT- en RL-pipelines tonen wij aan dat hoogwaardige CoT-distillatie een robuuste basis legt tijdens SFT, terwijl moeilijkheids- en verifieerbaarheidsbewuste steekproefvorming de RL-generaliseerbaarheid verbetert. Geëvalueerd op negen benchmarks voor algemene financiële taken, sentimentanalyse en numeriek redeneren, overtreft onze ODA-Fin-RL-8B consistent open-source state-of-the-art (SOTA) financiële LLM's van vergelijkbare grootte. Wij maken onze ODA-Fin-SFT-318k- en ODA-Fin-RL-12k-datasets, samen met getrainde modellen, openbaar om data-gedreven financieel AI-onderzoek te bevorderen.

English

Large Language Models (LLMs) have demonstrated strong general capabilities, yet their deployment in finance remains challenging due to dense domain-specific terminology, stringent numerical reasoning requirements, and low tolerance for factual errors. We conduct a controlled empirical study showing that in specialized vertical domains, performance is largely determined by the quality and difficulty/verifiability profile of post-training data. We introduce ODA-Fin-SFT-318k, constructed via multi-stage distillation and verification to produce high-quality Chain-of-Thought supervision, and ODA-Fin-RL-12k, curated for hard-but-verifiable tasks that balance reward precision and task diversity. Using standard SFT and RL pipelines, we show that high-quality CoT distillation establishes a robust foundation during SFT, while difficulty- and verifiability-aware sampling improves RL generalization. Evaluated on nine benchmarks spanning general financial tasks, sentiment analysis, and numerical reasoning, our ODA-Fin-RL-8B consistently surpasses open-source state-of-the-art (SOTA) financial LLMs of comparable size. We release our ODA-Fin-SFT-318k and ODA-Fin-RL-12k datasets, along with trained models to advance data-centric financial AI research.

Het Ontsluiten van Datawaarde in de Financiële Sector: Een Onderzoek naar Distillatie en Moeilijkheidsbewuste Training

Unlocking Data Value in Finance: A Study on Distillation and Difficulty-Aware Training

Samenvatting

Support