軟體開發人員與ChatGPT的比較：一項實證研究

摘要

自動化在軟體工程（SE）任務中的出現已從理論轉變為現實。許多學術文章記錄了人工智慧成功應用於解決領域中的問題，如項目管理、建模、測試和開發。最近的創新是ChatGPT的推出，這是一個注入機器學習的聊天機器人，被吹捧為一個能夠為開發人員和測試人員分別生成編程代碼和制定軟體測試策略的資源。儘管有人猜測基於人工智慧的計算可以提高生產力，甚至取代軟體工程師進行軟體開發，但目前缺乏實證證據來驗證這一點。此外，儘管主要關注提高人工智慧系統的準確性，但非功能性需求，包括能源效率、易受攻擊性、公平性（即人類偏見）和安全性經常受到不足的關注。本文認為，對比軟體工程師和基於人工智慧解決方案，考慮各種評估標準，對於促進人機協作、提高基於人工智慧方法的可靠性，以及了解任務適合人類還是人工智慧至關重要。此外，它有助於有效實施合作工作結構和人在迴圈過程。本文進行了實證研究，對比了軟體工程師和人工智慧系統（如ChatGPT）在不同評估指標下的表現。實證研究包括一個案例，評估了ChatGPT生成的代碼與開發人員生成並上傳到Leetcode的代碼。

English

The advent of automation in particular Software Engineering (SE) tasks has transitioned from theory to reality. Numerous scholarly articles have documented the successful application of Artificial Intelligence to address issues in areas such as project management, modeling, testing, and development. A recent innovation is the introduction of ChatGPT, an ML-infused chatbot, touted as a resource proficient in generating programming codes and formulating software testing strategies for developers and testers respectively. Although there is speculation that AI-based computation can increase productivity and even substitute software engineers in software development, there is currently a lack of empirical evidence to verify this. Moreover, despite the primary focus on enhancing the accuracy of AI systems, non-functional requirements including energy efficiency, vulnerability, fairness (i.e., human bias), and safety frequently receive insufficient attention. This paper posits that a comprehensive comparison of software engineers and AI-based solutions, considering various evaluation criteria, is pivotal in fostering human-machine collaboration, enhancing the reliability of AI-based methods, and understanding task suitability for humans or AI. Furthermore, it facilitates the effective implementation of cooperative work structures and human-in-the-loop processes. This paper conducts an empirical investigation, contrasting the performance of software engineers and AI systems, like ChatGPT, across different evaluation metrics. The empirical study includes a case of assessing ChatGPT-generated code versus code produced by developers and uploaded in Leetcode.

軟體開發人員與ChatGPT的比較：一項實證研究

Comparing Software Developers with ChatGPT: An Empirical Investigation

摘要

Support