利用程序分析反饋訓練語言模型生成高質量代碼
Training Language Models to Generate Quality Code with Program Analysis Feedback
May 28, 2025
作者: Feng Yao, Zilong Wang, Liyuan Liu, Junxia Cui, Li Zhong, Xiaohan Fu, Haohui Mai, Vish Krishnan, Jianfeng Gao, Jingbo Shang
cs.AI
摘要
基於大型語言模型(LLMs)的代碼生成,常被稱為氛圍編程,在生產環境中日益普及,但卻難以確保代碼質量,尤其是在安全性(例如SQL注入漏洞)和可維護性(例如缺少類型註解)方面。現有方法,如監督微調和基於規則的後處理,依賴於勞動密集型的註釋或脆弱的啟發式方法,限制了其可擴展性和有效性。我們提出了REAL,這是一個強化學習框架,通過程序分析引導的反饋來激勵LLMs生成生產級質量的代碼。具體而言,REAL整合了兩種自動化信號:(1) 檢測安全性或可維護性缺陷的程序分析,以及(2) 確保功能正確性的單元測試。與先前的工作不同,我們的框架是提示無關且無需參考的,從而實現了無需人工干預的可擴展監督。在多個數據集和模型規模上的實驗表明,REAL在功能和代碼質量的同步評估中優於最先進的方法。我們的工作彌合了快速原型設計與生產就緒代碼之間的差距,使LLMs能夠同時提供速度和質量。
English
Code generation with large language models (LLMs), often termed vibe coding,
is increasingly adopted in production but fails to ensure code quality,
particularly in security (e.g., SQL injection vulnerabilities) and
maintainability (e.g., missing type annotations). Existing methods, such as
supervised fine-tuning and rule-based post-processing, rely on labor-intensive
annotations or brittle heuristics, limiting their scalability and
effectiveness. We propose REAL, a reinforcement learning framework that
incentivizes LLMs to generate production-quality code using program
analysis-guided feedback. Specifically, REAL integrates two automated signals:
(1) program analysis detecting security or maintainability defects and (2) unit
tests ensuring functional correctness. Unlike prior work, our framework is
prompt-agnostic and reference-free, enabling scalable supervision without
manual intervention. Experiments across multiple datasets and model scales
demonstrate that REAL outperforms state-of-the-art methods in simultaneous
assessments of functionality and code quality. Our work bridges the gap between
rapid prototyping and production-ready code, enabling LLMs to deliver both
speed and quality.