對於程式驗證,排名LLM生成的循環不變量
Ranking LLM-Generated Loop Invariants for Program Verification
October 13, 2023
作者: Saikat Chakraborty, Shuvendu K. Lahiri, Sarah Fakhoury, Madanlal Musuvathi, Akash Lal, Aseem Rastogi, Aditya Senthilnathan, Rahul Sharma, Nikhil Swamy
cs.AI
摘要
在自動化程式驗證中,合成歸納循環不變量是至關重要的。在這項工作中,我們觀察到大型語言模型(如gpt-3.5或gpt-4)能夠在0-shot設置中為一類程式合成循環不變量,但需要多個樣本才能生成正確的不變量。這可能導致大量呼叫程式驗證器來建立不變量。為了解決這個問題,我們提出了一種對LLM生成結果的{\it 重新排序}方法。我們設計了一個能夠根據問題定義區分正確歸納不變量和錯誤嘗試的排序器。這個排序器被優化為對比排序器。實驗結果表明,這種重新排序機制顯著改善了生成候選不變量的排名,從而顯著減少了對驗證器的呼叫次數。
English
Synthesizing inductive loop invariants is fundamental to automating program
verification. In this work, we observe that Large Language Models (such as
gpt-3.5 or gpt-4) are capable of synthesizing loop invariants for a class of
programs in a 0-shot setting, yet require several samples to generate the
correct invariants. This can lead to a large number of calls to a program
verifier to establish an invariant. To address this issue, we propose a {\it
re-ranking} approach for the generated results of LLMs. We have designed a
ranker that can distinguish between correct inductive invariants and incorrect
attempts based on the problem definition. The ranker is optimized as a
contrastive ranker. Experimental results demonstrate that this re-ranking
mechanism significantly improves the ranking of correct invariants among the
generated candidates, leading to a notable reduction in the number of calls to
a verifier.