린 리팩터: 에이전트 기반 전략 탐색을 통한 다목적 제어 가능 증명 최적화

초록

본 논문에서는 Lean 증명의 다중 목적, 제어 가능, 버전 내성적 리팩토링을 위한 플러그 앤 플레이 방식의 검색 증강 에이전트 프레임워크인 Lean Refactor를 제시한다. LLM이 생성한 증명은 정확하지만 장황하고 라이브러리 버전 간 취약한 것으로 악명 높으나, 기존 리팩토링 연구는 세 가지 실용적 과제를 간과하고 있다: 1) Lean 리팩토링은 본질적으로 다중 목적(증명 길이, 컴파일 비용, 버전 호환성이 종종 상충됨)이며, 2) Lean 저장소는 취약한 호환성을 지니는 반면 LLM 릴리스는 Lean/Mathlib 버전을 인지하지 못하며, 3) 학습 기반 파이프라인은 각 LLM 릴리스마다 미세 조정을 반복해야 하므로 모델 변화나 Lean의 릴리스 주기에 맞춰 확장되지 않는다. Lean Refactor는 지원되는 Lean/Mathlib 버전 및 예상 컴파일 비용 절감 등의 메타데이터로 조밀하게 주석이 달린 다중 목적 리팩토링 전략의 선별된 데이터베이스에서 검색 결과를 활용하여 고정된 에이전트적 LLM을 제어한다. 실험 결과, 경쟁 벤치마크에서 70% 이상의 토큰 수준 압축률, 연구 저장소에서 20% 이상의 압축률, 최대 60%의 컴파일 시간 단축을 보여 이전 연구 및 Claude Code보다 우수한 성능을 입증했다. 버전 필터링 검색은 대상 Lean 버전에서의 압축률을 더욱 향상시켰으며, 리팩토링된 miniF2F 증명은 리팩토링되지 않은 증명에 비해 향후 Lean 릴리스에 대한 더 강력한 제로샷 버전 전이를 보였다.

English

We present Lean Refactor, a plug-and-play retrieval-augmented agentic framework for multi-objective, controllable, and version-robust refactoring of Lean proofs. LLM-generated proofs are notoriously correct-but-verbose and brittle across library versions, yet existing refactoring works overlook three practical challenges: 1) Lean refactoring is natively multi-objective (proof length, compilation cost, and version compatibility are often in tension); 2) Lean repositories have fragile compatibility, whereas LLM releases are unaware of Lean/Mathlib versions; 3) Training-based pipelines require repeated fine-tuning with each new LLM release, scaling neither with model churn nor with Lean's release cycle. Lean Refactor steers a frozen agentic LLM with retrievals from a curated database of multi-objective refactoring strategies, each densely annotated with metadata such as supported Lean/Mathlib versions and expected compilation-cost reduction. Experiments show over 70% token-level compression on competition benchmarks, over 20% on research repositories, and up to 60% compilation-time reduction, outperforming prior work and Claude Code. Version-filtered retrieval further improves compression on the target Lean version, and refactored miniF2F proofs exhibit stronger zero-shot version transfer to future Lean releases than their unrefactored counterparts.