プログラム解析フィードバックを用いた高品質なコード生成のための言語モデルの訓練

要旨

大規模言語モデル（LLMs）を用いたコード生成、いわゆる「バイブコーディング」は、生産環境での採用が増加しているものの、特にセキュリティ（例：SQLインジェクションの脆弱性）や保守性（例：型アノテーションの欠如）においてコード品質を保証することができていない。既存の手法、例えば教師ありファインチューニングやルールベースの後処理は、労力を要するアノテーションや脆弱なヒューリスティクスに依存しており、その拡張性と有効性が制限されている。本研究では、プログラム解析に基づくフィードバックを用いてLLMsに生産品質のコードを生成させる強化学習フレームワーク「REAL」を提案する。具体的には、REALは2つの自動化されたシグナルを統合する：（1）セキュリティや保守性の欠陥を検出するプログラム解析、（2）機能的正しさを保証するユニットテスト。従来の研究とは異なり、本フレームワークはプロンプトに依存せず、参照データを必要としないため、手動介入なしで拡張可能な監視を実現する。複数のデータセットとモデル規模にわたる実験により、REALは機能性とコード品質の同時評価において最先端の手法を上回ることを示す。本研究は、迅速なプロトタイピングと本番環境対応コードの間のギャップを埋め、LLMsが速度と品質の両方を提供することを可能にする。

English

Code generation with large language models (LLMs), often termed vibe coding, is increasingly adopted in production but fails to ensure code quality, particularly in security (e.g., SQL injection vulnerabilities) and maintainability (e.g., missing type annotations). Existing methods, such as supervised fine-tuning and rule-based post-processing, rely on labor-intensive annotations or brittle heuristics, limiting their scalability and effectiveness. We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code using program analysis-guided feedback. Specifically, REAL integrates two automated signals: (1) program analysis detecting security or maintainability defects and (2) unit tests ensuring functional correctness. Unlike prior work, our framework is prompt-agnostic and reference-free, enabling scalable supervision without manual intervention. Experiments across multiple datasets and model scales demonstrate that REAL outperforms state-of-the-art methods in simultaneous assessments of functionality and code quality. Our work bridges the gap between rapid prototyping and production-ready code, enabling LLMs to deliver both speed and quality.

プログラム解析フィードバックを用いた高品質なコード生成のための言語モデルの訓練

Training Language Models to Generate Quality Code with Program Analysis Feedback

要旨

Support