建立一個以證明為導向的程式設計師，比 GPT-4o 在資料稀缺情況下提升了 64%。

摘要

現有的語言模型在證明導向編程方面遇到困難，主要是由於數據稀缺，這表現在兩個關鍵方面：(1) 缺乏足夠的語料庫，用於證明導向編程語言，如 F*；以及 (2) 缺乏大規模的、項目級的證明導向實現，這可以教導模型在執行證明導向編程時進行複雜的推理過程。我們提出了第一個針對項目級證明導向編程的合成數據增強方法，用於生成和修復。我們的方法通過合成基本的證明導向編程問題，以提高對該語言的熟練程度；將多樣化的編碼數據納入推理能力的引發，並在現有存儲庫中創建新的證明和修復數據。這種方法使語言模型能夠為功能和存儲庫級別的代碼合成和修復證明。我們展示了我們微調的 140 億參數模型 PoPilot，可以超越在項目級證明導向編程方面表現優於 GPT-4o 的模型，相對邊際提高了 64%，並且通過修復其輸出，可以將 GPT-4o 的性能提高 54%。

English

Existing LMs struggle with proof-oriented programming due to data scarcity, which manifest in two key ways: (1) a lack of sufficient corpora for proof-oriented programming languages such as F*, and (2) the absence of large-scale, project-level proof-oriented implementations that can teach the model the intricate reasoning process when performing proof-oriented programming. We present the first on synthetic data augmentation for project level proof oriented programming for both generation and repair. Our method addresses data scarcity by synthesizing basic proof-oriented programming problems for proficiency in that language; incorporating diverse coding data for reasoning capability elicitation and creating new proofs and repair data within existing repositories. This approach enables language models to both synthesize and repair proofs for function- and repository-level code. We show that our fine-tuned 14B parameter model, PoPilot, can exceed the performance of the models that outperforms GPT-4o in project-level proof-oriented programming by 64% relative margin, and can improve GPT-4o's performance by 54% by repairing its outputs over GPT-4o's self-repair.

建立一個以證明為導向的程式設計師，比 GPT-4o 在資料稀缺情況下提升了 64%。

Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarsity

摘要

Support