使用監視器引導具有全局上下文的程式碼語言模型
Guiding Language Models of Code with Global Context using Monitors
June 19, 2023
作者: Lakshya A Agrawal, Aditya Kanade, Navin Goyal, Shuvendu K. Lahiri, Sriram K. Rajamani
cs.AI
摘要
程式碼語言模型(LMs)在生成時周圍程式碼提供足夠上下文時表現良好。然而,當需要使用另一個模組或庫中定義的類型或功能時,尤其是在訓練期間未見過的類型或功能時,這種情況就不成立了。LMs缺乏對這種全局上下文的認識,最終會出現幻覺,例如不正確地使用其他文件中定義的類型。最近的研究試圖通過檢索全局信息來擴充本地上下文以克服這個問題。然而,這會使提示過於冗長,或需要架構修改和額外的訓練。
集成開發環境(IDEs)通過靜態分析將全局上下文帶到開發人員的指尖,以協助開發人員。我們將這種開發人員所享受的幫助擴展到LMs。我們提出了一種使用背景靜態分析來引導解碼的監視器概念。與事先檢索不同,靜態分析在整個解碼過程中迭代調用,根據需求提供最相關的建議。我們通過監控LM生成對象解引用的程式碼時對識別符的類型一致使用的有用性來展示我們提案的有效性。
為了評估我們的方法,我們精心編輯了PragmaticCode數據集,其中包含開源項目及其開發環境。在不同參數規模的模型上,我們展示了監視器引導解碼能夠持續提高LM生成與真實情況相符的識別符的能力,並提高編譯速度和與真實情況的一致性。我們發現,在監視器引導下,具有較少參數的LMs可以優於較大的LMs。通過監視器引導解碼,SantaCoder-1.1B實現了比規模更大的text-davinci-003模型更好的編譯速度和下一個識別符匹配。數據集和代碼將在https://aka.ms/monitors4codegen 上發布。
English
Language models of code (LMs) work well when the surrounding code in the
vicinity of generation provides sufficient context. This is not true when it
becomes necessary to use types or functionality defined in another module or
library, especially those not seen during training. LMs suffer from limited
awareness of such global context and end up hallucinating, e.g., using types
defined in other files incorrectly. Recent work tries to overcome this issue by
retrieving global information to augment the local context. However, this
bloats the prompt or requires architecture modifications and additional
training.
Integrated development environments (IDEs) assist developers by bringing the
global context at their fingertips using static analysis. We extend this
assistance, enjoyed by developers, to the LMs. We propose a notion of monitors
that use static analysis in the background to guide the decoding. Unlike a
priori retrieval, static analysis is invoked iteratively during the entire
decoding process, providing the most relevant suggestions on demand. We
demonstrate the usefulness of our proposal by monitoring for type-consistent
use of identifiers whenever an LM generates code for object dereference.
To evaluate our approach, we curate PragmaticCode, a dataset of open-source
projects with their development environments. On models of varying parameter
scale, we show that monitor-guided decoding consistently improves the ability
of an LM to not only generate identifiers that match the ground truth but also
improves compilation rates and agreement with ground truth. We find that LMs
with fewer parameters, when guided with our monitor, can outperform larger LMs.
With monitor-guided decoding, SantaCoder-1.1B achieves better compilation rate
and next-identifier match than the much larger text-davinci-003 model. The
datasets and code will be released at https://aka.ms/monitors4codegen .