GPT-4は優れたデータアナリストか？

要旨

大規模言語モデル（LLM）は、文脈理解、コード生成、言語生成、データストーリーテリングなど、多くの分野やタスクにおいてその強力な能力を発揮してきた。これにより、多くのデータアナリストは、AIによって自身の仕事が置き換えられるのではないかという懸念を抱いている。この論争の的となる話題は、世間の注目を集めている。しかし、我々はまだ決定的な結論に至らない段階にあり、意見が分かれている。この動機から、本研究では「GPT-4は優れたデータアナリストか？」という研究課題を提起し、直接比較研究を行うことでこれを解明することを目指す。具体的には、GPT-4をデータアナリストと見なし、多様な分野のデータベースを用いてエンドツーエンドのデータ分析を実行する。我々は、GPT-4が実験を実施するためのプロンプトを慎重に設計することで、この問題に取り組むためのフレームワークを提案する。また、いくつかのタスク固有の評価指標を設計し、複数の専門的な人間のデータアナリストとGPT-4のパフォーマンスを体系的に比較する。実験結果は、GPT-4が人間と同等のパフォーマンスを達成できることを示している。さらに、GPT-4がデータアナリストを置き換えることができるという結論に至る前に、我々の結果について詳細な議論を提供し、今後の研究に光を当てる。

English

As large language models (LLMs) have demonstrated their powerful capabilities in plenty of domains and tasks, including context understanding, code generation, language generation, data storytelling, etc., many data analysts may raise concerns if their jobs will be replaced by AI. This controversial topic has drawn a lot of attention in public. However, we are still at a stage of divergent opinions without any definitive conclusion. Motivated by this, we raise the research question of "is GPT-4 a good data analyst?" in this work and aim to answer it by conducting head-to-head comparative studies. In detail, we regard GPT-4 as a data analyst to perform end-to-end data analysis with databases from a wide range of domains. We propose a framework to tackle the problems by carefully designing the prompts for GPT-4 to conduct experiments. We also design several task-specific evaluation metrics to systematically compare the performance between several professional human data analysts and GPT-4. Experimental results show that GPT-4 can achieve comparable performance to humans. We also provide in-depth discussions about our results to shed light on further studies before we reach the conclusion that GPT-4 can replace data analysts.

GPT-4は優れたデータアナリストか？

Is GPT-4 a Good Data Analyst?

要旨

Support