私は知っている - どのLLMがあなたのコードを書いたのか：LLM生成コードのスタイロメトリによる著者帰属

要旨

AI生成コード、ディープフェイク、その他の合成コンテンツの検出は、新たな研究課題として浮上している。大規模言語モデル（LLM）によって生成されたコードが一般的になるにつれ、各サンプルの背後にある特定のモデルを識別することがますます重要になっている。本論文は、CプログラムにおけるLLM著者帰属に関する初の体系的な研究を提示する。我々は、CodeT5のエンコーダ-デコーダアーキテクチャからデコーダを除外し、分類に焦点を当てた新たなモデルであるCodeT5-Authorshipをリリースした。このモデルのエンコーダ出力（最初のトークン）は、GELU活性化関数とドロップアウトを備えた2層の分類ヘッドを通過し、可能な著者に対する確率分布を生成する。我々のアプローチを評価するため、8つの最先端LLMによって生成された32,000のコンパイル可能なCプログラムからなるベンチマークLLM-AuthorBenchを導入した。我々のモデルを、7つの従来の機械学習分類器と、BERT、RoBERTa、CodeBERT、ModernBERT、DistilBERT、DeBERTa-V3、Longformer、LoRAファインチューニングされたQwen2-1.5Bを含む8つのファインチューニングされたトランスフォーマーモデルと比較した。二値分類において、我々のモデルはGPT-4.1とGPT-4oなどの密接に関連するモデルによって生成されたCプログラムを識別する際に97.56%の精度を達成し、5つの主要なLLM（Gemini 2.5 Flash、Claude 3.5 Haiku、GPT-4.1、Llama 3.3、DeepSeek-V3）間の多クラス帰属において95.40%の精度を達成した。オープンサイエンスを支援するため、CodeT5-Authorshipアーキテクチャ、LLM-AuthorBenchベンチマーク、および関連するすべてのGoogle ColabスクリプトをGitHubで公開している: https://github.com/LLMauthorbench/。

English

Detecting AI-generated code, deepfakes, and other synthetic content is an emerging research challenge. As code generated by Large Language Models (LLMs) becomes more common, identifying the specific model behind each sample is increasingly important. This paper presents the first systematic study of LLM authorship attribution for C programs. We released CodeT5-Authorship, a novel model that uses only the encoder layers from the original CodeT5 encoder-decoder architecture, discarding the decoder to focus on classification. Our model's encoder output (first token) is passed through a two-layer classification head with GELU activation and dropout, producing a probability distribution over possible authors. To evaluate our approach, we introduce LLM-AuthorBench, a benchmark of 32,000 compilable C programs generated by eight state-of-the-art LLMs across diverse tasks. We compare our model to seven traditional ML classifiers and eight fine-tuned transformer models, including BERT, RoBERTa, CodeBERT, ModernBERT, DistilBERT, DeBERTa-V3, Longformer, and LoRA-fine-tuned Qwen2-1.5B. In binary classification, our model achieves 97.56% accuracy in distinguishing C programs generated by closely related models such as GPT-4.1 and GPT-4o, and 95.40% accuracy for multi-class attribution among five leading LLMs (Gemini 2.5 Flash, Claude 3.5 Haiku, GPT-4.1, Llama 3.3, and DeepSeek-V3). To support open science, we release the CodeT5-Authorship architecture, the LLM-AuthorBench benchmark, and all relevant Google Colab scripts on GitHub: https://github.com/LLMauthorbench/.

私は知っている - どのLLMがあなたのコードを書いたのか：LLM生成コードのスタイロメトリによる著者帰属

I Know Which LLM Wrote Your Code Last Summer: LLM generated Code Stylometry for Authorship Attribution

要旨

Support