半自律的形式化によるVlasov-Maxwell-Landau平衡

要旨

本論文では、荷電プラズマの運動を記述するVlasov-Maxwell-Landau (VML) システムにおける平衡特性の完全なLean 4形式化を提示する。本プロジェクトは、AI支援による数学研究の完全なループを実証するものである。すなわち、AI推論モデル (Gemini DeepThink) が予想から証明を生成し、エージェント型コーディングツール (Claude Code) が自然言語プロンプトからLeanコードへ翻訳し、専門プローバー (Aristotle) が111の補題を閉じ、Leanカーネルが結果を検証した。1人の数学者が10日間、200ドルのコストでプロセスを監督し、コードは一行も書かなかった。開発プロセス全体は公開されている。229の人間によるプロンプトと213のgitコミットは全てリポジトリに記録されている。我々は、AIの失敗モード（仮説の蔓延、定義整合性バグ、エージェントの回避行動）と、成功した要素（抽象的/具体的証明の分離、敵対的自己レビュー、主要な定義と定理ステートメントの人間によるレビューの重要性）に関する詳細な知見を報告する。特筆すべきは、対応する数学論文の最終草稿が完成する前に、この形式化が完了した点である。

English

We present a complete Lean 4 formalization of the equilibrium characterization in the Vlasov-Maxwell-Landau (VML) system, which describes the motion of charged plasma. The project demonstrates the full AI-assisted mathematical research loop: an AI reasoning model (Gemini DeepThink) generated the proof from a conjecture, an agentic coding tool (Claude Code) translated it into Lean from natural-language prompts, a specialized prover (Aristotle) closed 111 lemmas, and the Lean kernel verified the result. A single mathematician supervised the process over 10 days at a cost of \$200, writing zero lines of code. The entire development process is public: all 229 human prompts, and 213 git commits are archived in the repository. We report detailed lessons on AI failure modes -- hypothesis creep, definition-alignment bugs, agent avoidance behaviors -- and on what worked: the abstract/concrete proof split, adversarial self-review, and the critical role of human review of key definitions and theorem statements. Notably, the formalization was completed before the final draft of the corresponding math paper was finished.

半自律的形式化によるVlasov-Maxwell-Landau平衡

Semi-Autonomous Formalization of the Vlasov-Maxwell-Landau Equilibrium

要旨

Support