研究主題:論推理增強型LLM模型在金融領域的可轉移性
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance
February 12, 2025
作者: Lingfei Qian, Weipeng Zhou, Yan Wang, Xueqing Peng, Jimin Huang, Qianqian Xie
cs.AI
摘要
最近大型語言模型(LLMs)的進展展示了強大的一般推理能力,然而它們在財務推理方面的效果尚未得到充分探討。在這項研究中,我們全面評估了16個強大的推理和一般性LLMs在涉及財務文本、表格數據和方程式的三個複雜財務任務上的表現,評估了數值推理、表格解釋、財務術語理解、長文本處理和基於方程的問題解決能力。我們的結果顯示,儘管更好的數據集和預訓練可以改善財務推理,但像CoT微調這樣的一般性增強並不總是能帶來一致的收益。此外,所有推理策略在提高長文本和多表格任務的表現方面都面臨挑戰。為了應對這些限制,我們基於Llama-3.1-8B-Instruct開發了一個財務推理增強模型,通過CoT微調和具有特定領域推理路徑的強化學習。即使只是對一個財務數據集進行簡單的微調,我們的模型在各任務上實現了一致的10%性能提升,超越了所有8B模型,甚至平均超越了Llama3-70B-Instruct和Llama3.1-70B-Instruct。我們的結果凸顯了在財務任務中需要特定領域適應的重要性,強調未來方向,如多表格推理、長文本處理和財務術語理解。我們的所有數據集、模型和代碼都是公開可用的。此外,我們引入了一個排行榜,用於對未來數據集和模型進行基準測試。
English
Recent advancements in large language models (LLMs) have shown strong general
reasoning abilities, yet their effectiveness in financial reasoning remains
underexplored. In this study, we comprehensively evaluate 16 powerful reasoning
and general LLMs on three complex financial tasks involving financial text,
tabular data, and equations, assessing numerical reasoning, tabular
interpretation, financial terminology comprehension, long-context processing,
and equation-based problem solving. Our results show that while better datasets
and pretraining improve financial reasoning, general enhancements like CoT
fine-tuning do not always yield consistent gains. Moreover, all reasoning
strategies face challenges in improving performance on long-context and
multi-table tasks. To address these limitations, we develop a financial
reasoning-enhanced model based on Llama-3.1-8B-Instruct, by CoT fine-tuning and
reinforcement learning with domain-specific reasoning paths. Even with simple
fine-tuning with one financial dataset, our model achieves a consistent 10%
performance improvement across tasks, surpassing all 8B models and even
Llama3-70B-Instruct and Llama3.1-70B-Instruct on average. Our results highlight
the need for domain-specific adaptations in financial tasks, emphasizing future
directions such as multi-table reasoning, long-context processing, and
financial terminology comprehension. All our datasets, models, and codes are
publicly available. Furthermore, we introduce a leaderboard for benchmarking
future datasets and models.Summary
AI-Generated Summary