Lean重构：基于智能体策略搜索的多目标可控证明优化

摘要

我们提出Lean Refactor，这是一个即插即用的检索增强型智能体框架，用于对Lean证明进行多目标、可控且版本鲁棒的改写。LLM生成的证明以正确但冗长且脆弱于库版本而著称，然而现有的改写工作忽略了三个实际挑战：1) Lean改写本质上是多目标的（证明长度、编译成本和版本兼容性往往相互矛盾）；2) Lean仓库的兼容性脆弱，而LLM的发布并不感知Lean/Mathlib版本；3) 基于训练的流水线在每次LLM新版本发布时都需要重复微调，既无法随模型更迭扩展，也无法跟上Lean的发布周期。Lean Refactor通过从精心策划的多目标改写策略数据库中检索信息来引导一个冻结的智能体LLM，每条策略都密集标注了元数据，如支持的Lean/Mathlib版本和预期的编译成本降低。实验表明，在竞赛基准测试上实现了超过70%的token级压缩，在研究仓库上超过20%，编译时间减少高达60%，优于先前工作和Claude Code。基于版本过滤的检索进一步提高了目标Lean版本的压缩效果，改写后的miniF2F证明在向未来Lean版本进行零样本版本迁移时，比未改写的对应版本表现出更强的鲁棒性。

English

We present Lean Refactor, a plug-and-play retrieval-augmented agentic framework for multi-objective, controllable, and version-robust refactoring of Lean proofs. LLM-generated proofs are notoriously correct-but-verbose and brittle across library versions, yet existing refactoring works overlook three practical challenges: 1) Lean refactoring is natively multi-objective (proof length, compilation cost, and version compatibility are often in tension); 2) Lean repositories have fragile compatibility, whereas LLM releases are unaware of Lean/Mathlib versions; 3) Training-based pipelines require repeated fine-tuning with each new LLM release, scaling neither with model churn nor with Lean's release cycle. Lean Refactor steers a frozen agentic LLM with retrievals from a curated database of multi-objective refactoring strategies, each densely annotated with metadata such as supported Lean/Mathlib versions and expected compilation-cost reduction. Experiments show over 70% token-level compression on competition benchmarks, over 20% on research repositories, and up to 60% compilation-time reduction, outperforming prior work and Claude Code. Version-filtered retrieval further improves compression on the target Lean version, and refactored miniF2F proofs exhibit stronger zero-shot version transfer to future Lean releases than their unrefactored counterparts.