GitHub 저장소에서 AI 사용의 특성과 진화에 관한 실증 연구: 코드 주석을 통한 증거

초록

개발자들은 점점 더 ChatGPT, Copilot, Claude와 같은 AI 도구를 일상적인 소프트웨어 워크플로우에서 사용하고 있지만, 기존 연구들은 종종 LLM 출력물을 단독으로 평가할 뿐, 실제 프로젝트에서 개발자들이 이를 어떻게 적용하는지는 분석하지 않는다. 본 연구에서는 AI 사용을 명시적으로 언급한 35,361개의 GitHub 코드 주석과 이와 연관된 코드 블록을 분석한다. 먼저 500개의 고유한 주석과 코드 블록을 개방 코딩하여 AI 지원 개발 활동의 분류 체계를 도출한 뒤, 두 개의 LLM 기반 분류기를 사용하여 전체 데이터셋에 주석을 달고 Dawid-Skene 기대값 최대화 방법으로 예측값을 종합한다. 또한 12,996개의 후속 커밋 메시지를 분석하여 AI 지원 코드가 도입된 후 어떻게 진화하는지 살펴보고, 2022년 12월부터 2026년 3월까지의 시간적 추세를 조사한다. 연구 결과에 따르면, 개발자들은 주로 코드 구현을 위해 LLM을 사용하며, 그 다음으로 코드 개선, 디버깅, 문서화, 테스트 순으로 활용한다. 후속 커밋에서는 리팩토링 및 정리, 기능 통합 및 확장, 버그 수정이 빈번하게 발생하여, AI 지원 코드를 적용하는 데 있어 지속적인 인간의 감독이 이루어짐을 시사한다. 시간이 지남에 따라 AI를 언급하는 주석은 직접적인 코드 생성에서 지식 및 개념적 지원과 코드 개선 쪽으로 변화한다. 이러한 결과는 AI 도구가 단순한 코드 생성 도구를 넘어 협력적 지원 메커니즘으로 자리 잡고 있으며, 그 출력물이 개발자에 의해 지속적으로 개선, 확장, 수정되고 있음을 시사한다.

English

Developers increasingly use AI tools such as ChatGPT, Copilot, and Claude in everyday software workflows, but prior studies often evaluate LLM outputs in isolation rather than examining how developers adapt them in real projects. We analyze 35,361 GitHub code comments that explicitly reference AI use and their associated code blocks. We first open-code 500 unique comments and code blocks to derive a taxonomy of AI-assisted development activities, then annotate the full dataset using two LLM-based classifiers and aggregate predictions with Dawid-Skene expectation-maximization. We also analyze 12,996 subsequent commit messages to study how AI-assisted code evolves after introduction, and examine temporal trends from December 2022 to March 2026. Our results show that developers primarily use LLMs for code implementation, followed by code enhancement, debugging, documentation, and testing. Subsequent commits frequently involve refactoring and cleanup, feature integration and extension, and bug fixing, indicating sustained human oversight in adapting AI-assisted code. Over time, AI-referencing comments shift from direct code generation toward knowledge and conceptual support and code enhancement. These findings suggest that AI tools are becoming embedded not only as code-generation aids, but also as collaborative support mechanisms whose outputs are refined, extended, and corrected by developers over time.