Lean Refactor: エージェント戦略探索による多目的制御可能な証明最適化

要旨

Lean Refactorを提案する。これは、Lean証明の多目的・制御可能・バージョン耐性リファクタリングのためのプラグアンドプレイ型検索拡張エージェンティックフレームワークである。LLMが生成した証明は、正しいものの冗長でライブラリのバージョン間で脆弱であることが知られているが、既存のリファクタリング研究では以下の3つの実用的課題が見過ごされている。1) Leanのリファクタリングは本質的に多目的である（証明の長さ、コンパイルコスト、バージョン互換性がしばしばトレードオフの関係にある）。2) Leanリポジトリは互換性が脆弱である一方、LLMのリリースはLean/Mathlibのバージョンを認識しない。3) 学習ベースのパイプラインはLLMの新リリースごとに繰り返しファインチューニングが必要であり、モデルの更新頻度にもLeanのリリースサイクルにもスケールしない。Lean Refactorは、凍結されたエージェンティックLLMを、厳選された多目的リファクタリング戦略データベースからの検索によって誘導する。各戦略には、対応するLean/Mathlibバージョンや期待されるコンパイルコスト削減量などのメタデータが高密度に付与されている。実験では、競技用ベンチマークで70%超、研究リポジトリで20%超のトークンレベルの圧縮、最大60%のコンパイル時間削減を達成し、従来手法やClaude Codeを上回った。バージョンでフィルタリングされた検索により、対象Leanバージョンでの圧縮率がさらに向上し、リファクタリングされたminiF2F証明は、リファクタリングされていないものと比較して、将来のLeanリリースへのゼロショットバージョン転送性能が向上した。

English

We present Lean Refactor, a plug-and-play retrieval-augmented agentic framework for multi-objective, controllable, and version-robust refactoring of Lean proofs. LLM-generated proofs are notoriously correct-but-verbose and brittle across library versions, yet existing refactoring works overlook three practical challenges: 1) Lean refactoring is natively multi-objective (proof length, compilation cost, and version compatibility are often in tension); 2) Lean repositories have fragile compatibility, whereas LLM releases are unaware of Lean/Mathlib versions; 3) Training-based pipelines require repeated fine-tuning with each new LLM release, scaling neither with model churn nor with Lean's release cycle. Lean Refactor steers a frozen agentic LLM with retrievals from a curated database of multi-objective refactoring strategies, each densely annotated with metadata such as supported Lean/Mathlib versions and expected compilation-cost reduction. Experiments show over 70% token-level compression on competition benchmarks, over 20% on research repositories, and up to 60% compilation-time reduction, outperforming prior work and Claude Code. Version-filtered retrieval further improves compression on the target Lean version, and refactored miniF2F proofs exhibit stronger zero-shot version transfer to future Lean releases than their unrefactored counterparts.