将软件开发人员与ChatGPT进行比较：一项实证调查

摘要

自动化在特定的软件工程（SE）任务中的出现已经从理论过渡到现实。许多学术文章记录了人工智能成功应用于项目管理、建模、测试和开发等领域的情况。最近的创新是ChatGPT的推出，这是一个融入机器学习的聊天机器人，被宣传为一种能够为开发人员和测试人员分别生成编程代码和制定软件测试策略的资源。尽管有人猜测基于人工智能的计算可以提高生产率甚至取代软件工程师在软件开发中的角色，但目前缺乏实证证据来验证这一点。此外，尽管主要关注提高人工智能系统的准确性，但非功能性需求，包括能源效率、脆弱性、公平性（即人类偏见）和安全性经常受到不足的关注。本文认为，通过全面比较软件工程师和基于人工智能解决方案，考虑各种评估标准，对促进人机协作、提高基于人工智能方法的可靠性以及了解任务适合人类还是人工智能至关重要。此外，这有助于有效实施合作工作结构和人在环中的流程。本文进行了实证调查，对比了软件工程师和人工智能系统（如ChatGPT）在不同评估指标下的表现。实证研究包括评估ChatGPT生成的代码与开发人员编写并上传到Leetcode的代码的案例。

English

The advent of automation in particular Software Engineering (SE) tasks has transitioned from theory to reality. Numerous scholarly articles have documented the successful application of Artificial Intelligence to address issues in areas such as project management, modeling, testing, and development. A recent innovation is the introduction of ChatGPT, an ML-infused chatbot, touted as a resource proficient in generating programming codes and formulating software testing strategies for developers and testers respectively. Although there is speculation that AI-based computation can increase productivity and even substitute software engineers in software development, there is currently a lack of empirical evidence to verify this. Moreover, despite the primary focus on enhancing the accuracy of AI systems, non-functional requirements including energy efficiency, vulnerability, fairness (i.e., human bias), and safety frequently receive insufficient attention. This paper posits that a comprehensive comparison of software engineers and AI-based solutions, considering various evaluation criteria, is pivotal in fostering human-machine collaboration, enhancing the reliability of AI-based methods, and understanding task suitability for humans or AI. Furthermore, it facilitates the effective implementation of cooperative work structures and human-in-the-loop processes. This paper conducts an empirical investigation, contrasting the performance of software engineers and AI systems, like ChatGPT, across different evaluation metrics. The empirical study includes a case of assessing ChatGPT-generated code versus code produced by developers and uploaded in Leetcode.

将软件开发人员与ChatGPT进行比较：一项实证调查

Comparing Software Developers with ChatGPT: An Empirical Investigation

摘要

Support