利用大型语言模型在Rust中进行自动证明合成
Leveraging Large Language Models for Automated Proof Synthesis in Rust
November 7, 2023
作者: Jianan Yao, Ziqiao Zhou, Weiteng Chen, Weidong Cui
cs.AI
摘要
形式验证可以明确保证关键系统软件的正确性,但高证明负担长期以来一直阻碍了其广泛采用。最近,大型语言模型(LLMs)在代码分析和合成方面取得了成功。本文介绍了LLMs和静态分析相结合,用于合成不变量、断言和其他证明结构,应用于一种基于Rust的形式验证框架Verus。在少样本设置中,LLMs展现了出色的逻辑能力,特别是在分析短代码片段时生成后置条件和循环不变量。然而,LLMs缺乏保留和传播上下文信息的能力,这是传统静态分析的优势。基于这些观察,我们开发了一个基于OpenAI的GPT-4模型的原型。我们的原型将验证任务分解为多个较小的任务,迭代地查询GPT-4,并将其输出与轻量级静态分析相结合。我们在一个开发者参与的自动化循环中对该原型进行了评估,涉及20个向量操作程序。结果表明,它显著减少了编写入门级证明代码的人力工作量。
English
Formal verification can provably guarantee the correctness of critical system
software, but the high proof burden has long hindered its wide adoption.
Recently, Large Language Models (LLMs) have shown success in code analysis and
synthesis. In this paper, we present a combination of LLMs and static analysis
to synthesize invariants, assertions, and other proof structures for a
Rust-based formal verification framework called Verus. In a few-shot setting,
LLMs demonstrate impressive logical ability in generating postconditions and
loop invariants, especially when analyzing short code snippets. However, LLMs
lack the ability to retain and propagate context information, a strength of
traditional static analysis. Based on these observations, we developed a
prototype based on OpenAI's GPT-4 model. Our prototype decomposes the
verification task into multiple smaller ones, iteratively queries GPT-4, and
combines its output with lightweight static analysis. We evaluated the
prototype with a developer in the automation loop on 20 vector-manipulating
programs. The results demonstrate that it significantly reduces human effort in
writing entry-level proof code.