利用大型語言模型在Rust中進行自動證明合成
Leveraging Large Language Models for Automated Proof Synthesis in Rust
November 7, 2023
作者: Jianan Yao, Ziqiao Zhou, Weiteng Chen, Weidong Cui
cs.AI
摘要
正式驗證可以證明保證關鍵系統軟體的正確性,但長期以來高證明負擔一直阻礙了其廣泛應用。最近,大型語言模型(LLMs)在程式碼分析和合成方面取得了成功。本文介紹了LLMs和靜態分析相結合,用於為名為Verus的基於Rust的正式驗證框架合成不變量、斷言和其他證明結構。在少樣本設置中,LLMs展示了在生成事後條件和循環不變量方面的出色邏輯能力,特別是在分析短程式碼片段時。然而,LLMs缺乏保留和傳播上下文信息的能力,這是傳統靜態分析的優勢。基於這些觀察,我們開發了一個基於OpenAI的GPT-4模型的原型。我們的原型將驗證任務分解為多個較小的任務,迭代地查詢GPT-4,並將其輸出與輕量級靜態分析結合。我們通過在20個向量操作程式中將開發人員置於自動化迴圈中來評估該原型。結果表明,它顯著減少了編寫入門級證明程式所需的人力。
English
Formal verification can provably guarantee the correctness of critical system
software, but the high proof burden has long hindered its wide adoption.
Recently, Large Language Models (LLMs) have shown success in code analysis and
synthesis. In this paper, we present a combination of LLMs and static analysis
to synthesize invariants, assertions, and other proof structures for a
Rust-based formal verification framework called Verus. In a few-shot setting,
LLMs demonstrate impressive logical ability in generating postconditions and
loop invariants, especially when analyzing short code snippets. However, LLMs
lack the ability to retain and propagate context information, a strength of
traditional static analysis. Based on these observations, we developed a
prototype based on OpenAI's GPT-4 model. Our prototype decomposes the
verification task into multiple smaller ones, iteratively queries GPT-4, and
combines its output with lightweight static analysis. We evaluated the
prototype with a developer in the automation loop on 20 vector-manipulating
programs. The results demonstrate that it significantly reduces human effort in
writing entry-level proof code.