ソフトウェア開発者とChatGPTの比較：実証的調査

要旨

特にソフトウェアエンジニアリング（SE）タスクにおける自動化の到来は、理論から現実へと移行してきた。多くの学術論文が、プロジェクト管理、モデリング、テスト、開発などの領域における課題に対処するために人工知能（AI）が成功裏に適用された事例を記録している。最近の革新として、ChatGPTというML（機械学習）を組み込んだチャットボットが導入され、開発者向けのプログラミングコード生成やテスター向けのソフトウェアテスト戦略策定に熟練したリソースとして注目されている。AIベースの計算が生産性を向上させ、ソフトウェア開発においてソフトウェアエンジニアを代替する可能性があると推測されているものの、これを検証するための実証的な証拠は現時点で不足している。さらに、AIシステムの精度向上が主に焦点とされる一方で、エネルギー効率、脆弱性、公平性（すなわち人間のバイアス）、安全性といった非機能要件はしばしば十分な注意を払われていない。本論文は、ソフトウェアエンジニアとAIベースのソリューションをさまざまな評価基準に基づいて包括的に比較することが、人間と機械の協力を促進し、AIベースの手法の信頼性を高め、タスクの適性を人間またはAIのどちらに適しているかを理解する上で重要であると主張する。さらに、協調作業構造や人間をループに組み込んだプロセスの効果的な実装を容易にする。本論文では、ソフトウェアエンジニアとChatGPTのようなAIシステムのパフォーマンスを異なる評価指標に基づいて対比する実証的な調査を実施する。この実証研究には、ChatGPTが生成したコードと、開発者が作成しLeetcodeにアップロードしたコードを評価する事例が含まれる。

English

The advent of automation in particular Software Engineering (SE) tasks has transitioned from theory to reality. Numerous scholarly articles have documented the successful application of Artificial Intelligence to address issues in areas such as project management, modeling, testing, and development. A recent innovation is the introduction of ChatGPT, an ML-infused chatbot, touted as a resource proficient in generating programming codes and formulating software testing strategies for developers and testers respectively. Although there is speculation that AI-based computation can increase productivity and even substitute software engineers in software development, there is currently a lack of empirical evidence to verify this. Moreover, despite the primary focus on enhancing the accuracy of AI systems, non-functional requirements including energy efficiency, vulnerability, fairness (i.e., human bias), and safety frequently receive insufficient attention. This paper posits that a comprehensive comparison of software engineers and AI-based solutions, considering various evaluation criteria, is pivotal in fostering human-machine collaboration, enhancing the reliability of AI-based methods, and understanding task suitability for humans or AI. Furthermore, it facilitates the effective implementation of cooperative work structures and human-in-the-loop processes. This paper conducts an empirical investigation, contrasting the performance of software engineers and AI systems, like ChatGPT, across different evaluation metrics. The empirical study includes a case of assessing ChatGPT-generated code versus code produced by developers and uploaded in Leetcode.

ソフトウェア開発者とChatGPTの比較：実証的調査

Comparing Software Developers with ChatGPT: An Empirical Investigation

要旨

Support