大規模言語モデルを活用したRustにおける自動証明合成

要旨

形式的検証は、重要なシステムソフトウェアの正しさを証明可能に保証できるが、その高い証明負担が長らく広範な採用を妨げてきた。最近、大規模言語モデル（LLMs）がコード分析と合成において成功を収めている。本論文では、LLMsと静的解析を組み合わせて、Rustベースの形式的検証フレームワークであるVerusのための不変条件、表明、およびその他の証明構造を合成する手法を提案する。Few-shot設定において、LLMsは特に短いコードスニペットを分析する際に、事後条件やループ不変条件の生成において印象的な論理能力を示す。しかし、LLMsは従来の静的解析の強みである文脈情報の保持と伝播の能力を欠いている。これらの観察に基づき、OpenAIのGPT-4モデルを基にしたプロトタイプを開発した。このプロトタイプは検証タスクを複数の小さなタスクに分解し、GPT-4に反復的にクエリを送信し、その出力を軽量な静的解析と組み合わせる。開発者が自動化ループに参加した状態で、20のベクトル操作プログラムに対してプロトタイプを評価した。その結果、初級レベルの証明コードの作成における人的労力を大幅に削減できることが示された。

English

Formal verification can provably guarantee the correctness of critical system software, but the high proof burden has long hindered its wide adoption. Recently, Large Language Models (LLMs) have shown success in code analysis and synthesis. In this paper, we present a combination of LLMs and static analysis to synthesize invariants, assertions, and other proof structures for a Rust-based formal verification framework called Verus. In a few-shot setting, LLMs demonstrate impressive logical ability in generating postconditions and loop invariants, especially when analyzing short code snippets. However, LLMs lack the ability to retain and propagate context information, a strength of traditional static analysis. Based on these observations, we developed a prototype based on OpenAI's GPT-4 model. Our prototype decomposes the verification task into multiple smaller ones, iteratively queries GPT-4, and combines its output with lightweight static analysis. We evaluated the prototype with a developer in the automation loop on 20 vector-manipulating programs. The results demonstrate that it significantly reduces human effort in writing entry-level proof code.

大規模言語モデルを活用したRustにおける自動証明合成

Leveraging Large Language Models for Automated Proof Synthesis in Rust

要旨

Support