ChatGPTとGPT-4は金融テキスト分析のための汎用ソルバーなのか？典型的なタスクにおける検証

要旨

ChatGPTやGPT-4のような最新の大規模言語モデルは、人間の入力に対して高品質な応答を生成できることから、大きな注目を集めています。ChatGPTとGPT-4は一般的なテキストコーパスで広範にテストされ、その印象的な能力が示されていますが、金融コーパスに焦点を当てた研究はまだ行われていません。本研究では、このギャップを埋めるため、ChatGPTとGPT-4がゼロショットまたは少数ショット設定において、典型的な金融テキスト分析問題のソルバーとしての可能性を検証します。具体的には、5つの異なる金融テキストデータセットを用いて、4つの代表的なタスクにおけるそれらの能力を評価します。予備調査の結果、ChatGPTとGPT-4は、ドメイン固有の知識が要求される金融固有表現認識（NER）や感情分析などのタスクでは苦戦する一方、数値推論タスクでは優れた性能を発揮することがわかりました。我々は、ChatGPTとGPT-4の現行バージョンの強みと限界を報告し、それらを最先端のファインチューニングモデルやドメイン固有の事前学習済み生成モデルと比較します。実験を通じて質的な研究を行い、既存モデルの能力を理解し、さらなる改善を促進することを目指します。

English

The most recent large language models such as ChatGPT and GPT-4 have garnered significant attention, as they are capable of generating high-quality responses to human input. Despite the extensive testing of ChatGPT and GPT-4 on generic text corpora, showcasing their impressive capabilities, a study focusing on financial corpora has not been conducted. In this study, we aim to bridge this gap by examining the potential of ChatGPT and GPT-4 as a solver for typical financial text analytic problems in the zero-shot or few-shot setting. Specifically, we assess their capabilities on four representative tasks over five distinct financial textual datasets. The preliminary study shows that ChatGPT and GPT-4 struggle on tasks such as financial named entity recognition (NER) and sentiment analysis, where domain-specific knowledge is required, while they excel in numerical reasoning tasks. We report both the strengths and limitations of the current versions of ChatGPT and GPT-4, comparing them to the state-of-the-art finetuned models as well as pretrained domain-specific generative models. Our experiments provide qualitative studies, through which we hope to help understand the capability of the existing models and facilitate further improvements.

ChatGPTとGPT-4は金融テキスト分析のための汎用ソルバーなのか？典型的なタスクにおける検証

Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks

要旨

Support