利用程序分析反馈训练语言模型生成高质量代码

摘要

利用大型语言模型（LLMs）进行代码生成，常被称为氛围编程，已在生产环境中日益普及，但难以确保代码质量，特别是在安全性（如SQL注入漏洞）和可维护性（如缺少类型注解）方面。现有方法，如监督微调和基于规则的后处理，依赖于劳动密集型的标注或脆弱的启发式规则，限制了其扩展性和有效性。我们提出了REAL，一个强化学习框架，通过程序分析引导的反馈激励LLMs生成生产级质量的代码。具体而言，REAL整合了两种自动化信号：（1）检测安全或可维护性缺陷的程序分析；（2）确保功能正确性的单元测试。与先前工作不同，我们的框架与提示无关且无需参考，实现了无需人工干预的可扩展监督。在多个数据集和模型规模上的实验表明，REAL在功能性和代码质量的同步评估中优于最先进的方法。我们的工作弥合了快速原型设计与生产就绪代码之间的差距，使LLMs能够同时提供速度与质量。

English

Code generation with large language models (LLMs), often termed vibe coding, is increasingly adopted in production but fails to ensure code quality, particularly in security (e.g., SQL injection vulnerabilities) and maintainability (e.g., missing type annotations). Existing methods, such as supervised fine-tuning and rule-based post-processing, rely on labor-intensive annotations or brittle heuristics, limiting their scalability and effectiveness. We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code using program analysis-guided feedback. Specifically, REAL integrates two automated signals: (1) program analysis detecting security or maintainability defects and (2) unit tests ensuring functional correctness. Unlike prior work, our framework is prompt-agnostic and reference-free, enabling scalable supervision without manual intervention. Experiments across multiple datasets and model scales demonstrate that REAL outperforms state-of-the-art methods in simultaneous assessments of functionality and code quality. Our work bridges the gap between rapid prototyping and production-ready code, enabling LLMs to deliver both speed and quality.

利用程序分析反馈训练语言模型生成高质量代码

Training Language Models to Generate Quality Code with Program Analysis Feedback

摘要

Support