금융 데이터 가치 창출 탐구: 증류 및 난이도 인식 훈련 기법 연구

초록

대규모 언어 모델(LLM)은 강력한 일반적 능력을 보여주었으나, 집약된 도메인 특화 용어, 엄격한 수치 추론 요구사항, 사실 오류에 대한 낮은 허용 오차로 인해 금융 분야에서의 활용은 여전히 어려움을 겪고 있습니다. 본 연구는 특화된 수직 도메인에서 성능이 사후 훈련 데이터의 품질과 난이도/검증 가능성 프로필에 크게 좌우된다는 것을 통제된 실증 연구를 통해 보여줍니다. 우리는 고품질의 생각의 사슬(Chain-of-Thought) 감독을 생성하기 위해 다단계 정제 및 검증 과정을 거쳐 구축한 ODA-Fin-SFT-318k와, 보상 정밀도와 작업 다양성의 균형을 맞춘 검증 가능한 고난이도 과제를 위해 선별된 ODA-Fin-RL-12k 데이터셋을 소개합니다. 표준 SFT(지도 미세 조정) 및 RL(강화 학습) 파이프라인을 사용하여, 고품질 CoT 정제가 SFT 단계에서 견고한 기반을 마련하는 한편, 난이도 및 검증 가능성을 고려한 샘플링이 RL 일반화를 개선함을 입증합니다. 일반 금융 과제, 감정 분석, 수치 추론을 아우르는 9개 벤치마크에서 평가한 결과, 우리의 ODA-Fin-RL-8B 모델은 동일 규모의 오픈소스 최첨단(SOTA) 금융 LLM들을 지속적으로 능가했습니다. 데이터 중심 금융 AI 연구의 발전을 위해 ODA-Fin-SFT-318k 및 ODA-Fin-RL-12k 데이터셋과 훈련된 모델을 공개합니다.

English

Large Language Models (LLMs) have demonstrated strong general capabilities, yet their deployment in finance remains challenging due to dense domain-specific terminology, stringent numerical reasoning requirements, and low tolerance for factual errors. We conduct a controlled empirical study showing that in specialized vertical domains, performance is largely determined by the quality and difficulty/verifiability profile of post-training data. We introduce ODA-Fin-SFT-318k, constructed via multi-stage distillation and verification to produce high-quality Chain-of-Thought supervision, and ODA-Fin-RL-12k, curated for hard-but-verifiable tasks that balance reward precision and task diversity. Using standard SFT and RL pipelines, we show that high-quality CoT distillation establishes a robust foundation during SFT, while difficulty- and verifiability-aware sampling improves RL generalization. Evaluated on nine benchmarks spanning general financial tasks, sentiment analysis, and numerical reasoning, our ODA-Fin-RL-8B consistently surpasses open-source state-of-the-art (SOTA) financial LLMs of comparable size. We release our ODA-Fin-SFT-318k and ODA-Fin-RL-12k datasets, along with trained models to advance data-centric financial AI research.

금융 데이터 가치 창출 탐구: 증류 및 난이도 인식 훈련 기법 연구

Unlocking Data Value in Finance: A Study on Distillation and Difficulty-Aware Training

초록

Support