CodeCircuit: 속성 그래프를 통한 LLM 생성 코드 정확성 추론 방향

초록

현재 코드 검증 패러다임은 실행 기반 단위 테스트나 보조 LLM 판단과 같은 외부 메커니즘에 크게 의존하고 있으며, 이는 종종 노동 집약적이거나 판단 모델 자체의 능력에 의해 제한됩니다. 이는 근본적이면서도 아직 탐구되지 않은 질문을 제기합니다: LLM의 기능적 정확성을 순수하게 내부 계산 구조만으로 평가할 수 있을까? 우리의 주요 목표는 코드 생성 과정에서 모델의 신경 역학이 논리적 타당성을 예측할 수 있는 내부적으로 디코딩 가능한 신호를 인코딩하는지 조사하는 것입니다. 기계론적 해석성에서 영감을 받아, 우리는 코드 검증을 기계론적 진단 작업으로 간주하고 모델의 명시적 알고리즘 궤적을 라인 수준 귀속 그래프로 매핑하는 방법을 제안합니다. 복잡한 잔차 흐름을 분해함으로써, 모델 내부 회로 내에서 건전한 추론과 논리적 오류를 구별하는 구조적 특징을 식별하는 것을 목표로 합니다. Python, C++ 및 Java에 대한 분석을 통해 다양한 구문에서 내재적 정확성 신호가 강력하게 존재함을 확인했습니다. 이러한 내부 그래프의 위상학적 특징은 표면적 휴리스틱보다 정확성을 더 신뢰성 있게 예측하며, 오류가 있는 논리를 수정하기 위한 표적 인과 관계 개입을 가능하게 합니다. 이러한 발견들은 생성된 코드를 검증하기 위한 디코딩 가능한 속성으로서 내성적 검증의 기초를 마련합니다. 코드는 https://github.com/bruno686/CodeCircuit에서 확인할 수 있습니다.

English

Current paradigms for code verification rely heavily on external mechanisms-such as execution-based unit tests or auxiliary LLM judges-which are often labor-intensive or limited by the judging model's own capabilities. This raises a fundamental, yet unexplored question: Can an LLM's functional correctness be assessed purely from its internal computational structure? Our primary objective is to investigate whether the model's neural dynamics encode internally decodable signals that are predictive of logical validity during code generation. Inspired by mechanistic interpretability, we propose to treat code verification as a mechanistic diagnostic task, mapping the model's explicit algorithmic trajectory into line-level attribution graphs. By decomposing complex residual flows, we aim to identify the structural signatures that distinguish sound reasoning from logical failure within the model's internal circuits. Analysis across Python, C++, and Java confirms that intrinsic correctness signals are robust across diverse syntaxes. Topological features from these internal graphs predict correctness more reliably than surface heuristics and enable targeted causal interventions to fix erroneous logic. These findings establish internal introspection as a decodable property for verifying generated code. Our code is at https:// github.com/bruno686/CodeCircuit.

CodeCircuit: 속성 그래프를 통한 LLM 생성 코드 정확성 추론 방향

CodeCircuit: Toward Inferring LLM-Generated Code Correctness via Attribution Graphs

초록

Support