GPT 모델이 금융 분석가가 될 수 있을까? 모의 CFA 시험에서 ChatGPT와 GPT-4의 평가

초록

대규모 언어 모델(LLMs)은 다양한 자연어 처리(NLP) 작업에서 뛰어난 성능을 보여주며, 종종 최첨단 작업별 모델을 능가하거나 그에 필적하는 결과를 내놓고 있습니다. 본 연구는 LLMs의 금융 추론 능력을 평가하는 것을 목표로 합니다. 우리는 공인재무분석사(CFA) 프로그램의 모의 시험 문제를 활용하여 ChatGPT와 GPT-4의 금융 분석 능력을 제로샷(Zero-Shot, ZS), 사고 연쇄(Chain-of-Thought, CoT), 그리고 퓨샷(Few-Shot, FS) 시나리오를 고려하여 종합적으로 평가합니다. 우리는 모델의 성능과 한계에 대한 심층 분석을 제시하고, 이들이 CFA 시험을 통과할 가능성이 있는지 추정합니다. 마지막으로, LLMs의 금융 분야 적용 가능성을 높이기 위한 잠재적 전략과 개선 방안에 대한 통찰을 제시합니다. 이러한 관점에서, 우리는 이 연구가 엄격한 평가를 통해 금융 추론을 위한 LLMs의 지속적인 개선을 위한 미래 연구의 길을 열어주기를 바랍니다.

English

Large Language Models (LLMs) have demonstrated remarkable performance on a wide range of Natural Language Processing (NLP) tasks, often matching or even beating state-of-the-art task-specific models. This study aims at assessing the financial reasoning capabilities of LLMs. We leverage mock exam questions of the Chartered Financial Analyst (CFA) Program to conduct a comprehensive evaluation of ChatGPT and GPT-4 in financial analysis, considering Zero-Shot (ZS), Chain-of-Thought (CoT), and Few-Shot (FS) scenarios. We present an in-depth analysis of the models' performance and limitations, and estimate whether they would have a chance at passing the CFA exams. Finally, we outline insights into potential strategies and improvements to enhance the applicability of LLMs in finance. In this perspective, we hope this work paves the way for future studies to continue enhancing LLMs for financial reasoning through rigorous evaluation.

GPT 모델이 금융 분석가가 될 수 있을까? 모의 CFA 시험에서 ChatGPT와 GPT-4의 평가

Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams

초록

Support