지난 여름, 당신의 코드를 작성한 LLM을 알고 있습니다: LLM 생성 코드의 작가 특성 분석을 통한 저자 식별

초록

AI 생성 코드, 딥페이크 및 기타 합성 콘텐츠를 탐지하는 것은 새로운 연구 과제로 부상하고 있습니다. 대형 언어 모델(LLM)에 의해 생성된 코드가 점점 더 보편화됨에 따라, 각 샘플의 배경이 되는 특정 모델을 식별하는 것이 점점 더 중요해지고 있습니다. 본 논문은 C 프로그램에 대한 LLM 저자 귀속의 첫 번째 체계적인 연구를 제시합니다. 우리는 CodeT5-Authorship이라는 새로운 모델을 공개했는데, 이 모델은 원래의 CodeT5 인코더-디코더 아키텍처에서 디코더를 제거하고 분류에 집중하기 위해 인코더 레이어만 사용합니다. 우리 모델의 인코더 출력(첫 번째 토큰)은 GELU 활성화와 드롭아웃이 적용된 2층 분류 헤드를 통과하여 가능한 저자들에 대한 확률 분포를 생성합니다. 우리의 접근 방식을 평가하기 위해, 우리는 다양한 작업에서 8개의 최신 LLM에 의해 생성된 32,000개의 컴파일 가능한 C 프로그램으로 구성된 LLM-AuthorBench 벤치마크를 소개합니다. 우리는 우리의 모델을 7개의 전통적인 ML 분류기와 BERT, RoBERTa, CodeBERT, ModernBERT, DistilBERT, DeBERTa-V3, Longformer, 그리고 LoRA 미세 조정된 Qwen2-1.5B를 포함한 8개의 미세 조정된 트랜스포머 모델과 비교합니다. 이진 분류에서, 우리의 모델은 GPT-4.1과 GPT-4o와 같은 밀접하게 관련된 모델에 의해 생성된 C 프로그램을 구별하는 데 97.56%의 정확도를 달성했으며, 5개의 주요 LLM(Gemini 2.5 Flash, Claude 3.5 Haiku, GPT-4.1, Llama 3.3, DeepSeek-V3) 간의 다중 클래스 귀속에서 95.40%의 정확도를 달성했습니다. 개방형 과학을 지원하기 위해, 우리는 CodeT5-Authorship 아키텍처, LLM-AuthorBench 벤치마크 및 모든 관련 Google Colab 스크립트를 GitHub에 공개합니다: https://github.com/LLMauthorbench/.

English

Detecting AI-generated code, deepfakes, and other synthetic content is an emerging research challenge. As code generated by Large Language Models (LLMs) becomes more common, identifying the specific model behind each sample is increasingly important. This paper presents the first systematic study of LLM authorship attribution for C programs. We released CodeT5-Authorship, a novel model that uses only the encoder layers from the original CodeT5 encoder-decoder architecture, discarding the decoder to focus on classification. Our model's encoder output (first token) is passed through a two-layer classification head with GELU activation and dropout, producing a probability distribution over possible authors. To evaluate our approach, we introduce LLM-AuthorBench, a benchmark of 32,000 compilable C programs generated by eight state-of-the-art LLMs across diverse tasks. We compare our model to seven traditional ML classifiers and eight fine-tuned transformer models, including BERT, RoBERTa, CodeBERT, ModernBERT, DistilBERT, DeBERTa-V3, Longformer, and LoRA-fine-tuned Qwen2-1.5B. In binary classification, our model achieves 97.56% accuracy in distinguishing C programs generated by closely related models such as GPT-4.1 and GPT-4o, and 95.40% accuracy for multi-class attribution among five leading LLMs (Gemini 2.5 Flash, Claude 3.5 Haiku, GPT-4.1, Llama 3.3, and DeepSeek-V3). To support open science, we release the CodeT5-Authorship architecture, the LLM-AuthorBench benchmark, and all relevant Google Colab scripts on GitHub: https://github.com/LLMauthorbench/.

지난 여름, 당신의 코드를 작성한 LLM을 알고 있습니다: LLM 생성 코드의 작가 특성 분석을 통한 저자 식별

I Know Which LLM Wrote Your Code Last Summer: LLM generated Code Stylometry for Authorship Attribution

초록

Support