對於程式驗證，排名LLM生成的循環不變量

摘要

在自動化程式驗證中，合成歸納循環不變量是至關重要的。在這項工作中，我們觀察到大型語言模型（如gpt-3.5或gpt-4）能夠在0-shot設置中為一類程式合成循環不變量，但需要多個樣本才能生成正確的不變量。這可能導致大量呼叫程式驗證器來建立不變量。為了解決這個問題，我們提出了一種對LLM生成結果的{\it 重新排序}方法。我們設計了一個能夠根據問題定義區分正確歸納不變量和錯誤嘗試的排序器。這個排序器被優化為對比排序器。實驗結果表明，這種重新排序機制顯著改善了生成候選不變量的排名，從而顯著減少了對驗證器的呼叫次數。

English

Synthesizing inductive loop invariants is fundamental to automating program verification. In this work, we observe that Large Language Models (such as gpt-3.5 or gpt-4) are capable of synthesizing loop invariants for a class of programs in a 0-shot setting, yet require several samples to generate the correct invariants. This can lead to a large number of calls to a program verifier to establish an invariant. To address this issue, we propose a {\it re-ranking} approach for the generated results of LLMs. We have designed a ranker that can distinguish between correct inductive invariants and incorrect attempts based on the problem definition. The ranker is optimized as a contrastive ranker. Experimental results demonstrate that this re-ranking mechanism significantly improves the ranking of correct invariants among the generated candidates, leading to a notable reduction in the number of calls to a verifier.

對於程式驗證，排名LLM生成的循環不變量

Ranking LLM-Generated Loop Invariants for Program Verification

摘要

Support