프로그램 분석 피드백을 활용한 고품질 코드 생성을 위한 언어 모델 학습

초록

대규모 언어 모델(LLMs)을 활용한 코드 생성, 흔히 '바이브 코딩'이라 불리는 방식은 생산 환경에서 점차 채택되고 있지만, 특히 보안(예: SQL 인젝션 취약점)과 유지보수성(예: 타입 어노테이션 누락) 측면에서 코드 품질을 보장하지 못합니다. 기존의 방법들, 예를 들어 지도 학습을 통한 미세 조정이나 규칙 기반 후처리는 노동 집약적인 주석 작업이나 취약한 휴리스틱에 의존하여 확장성과 효과성이 제한됩니다. 우리는 REAL이라는 강화 학습 프레임워크를 제안하며, 이는 프로그램 분석 기반 피드백을 통해 LLMs가 생산 수준의 코드를 생성하도록 유도합니다. 구체적으로, REAL은 두 가지 자동화된 신호를 통합합니다: (1) 보안 또는 유지보수성 결함을 탐지하는 프로그램 분석과 (2) 기능적 정확성을 보장하는 단위 테스트입니다. 기존 연구와 달리, 우리의 프레임워크는 프롬프트에 구애받지 않고 참조 자료가 필요 없어, 수동 개입 없이도 확장 가능한 감독이 가능합니다. 여러 데이터셋과 모델 규모에 걸친 실험 결과, REAL은 기능성과 코드 품질을 동시에 평가하는 데 있어 최신 방법들을 능가하는 성능을 보여줍니다. 우리의 작업은 빠른 프로토타이핑과 생산 준비가 된 코드 간의 간극을 메우며, LLMs가 속도와 품질을 모두 제공할 수 있도록 합니다.

English

Code generation with large language models (LLMs), often termed vibe coding, is increasingly adopted in production but fails to ensure code quality, particularly in security (e.g., SQL injection vulnerabilities) and maintainability (e.g., missing type annotations). Existing methods, such as supervised fine-tuning and rule-based post-processing, rely on labor-intensive annotations or brittle heuristics, limiting their scalability and effectiveness. We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code using program analysis-guided feedback. Specifically, REAL integrates two automated signals: (1) program analysis detecting security or maintainability defects and (2) unit tests ensuring functional correctness. Unlike prior work, our framework is prompt-agnostic and reference-free, enabling scalable supervision without manual intervention. Experiments across multiple datasets and model scales demonstrate that REAL outperforms state-of-the-art methods in simultaneous assessments of functionality and code quality. Our work bridges the gap between rapid prototyping and production-ready code, enabling LLMs to deliver both speed and quality.

프로그램 분석 피드백을 활용한 고품질 코드 생성을 위한 언어 모델 학습

Training Language Models to Generate Quality Code with Program Analysis Feedback

초록

Support