ChatPaper.aiChatPaper

Lean 4 中的统计学习理论:从零开始构建经验过程

Statistical Learning Theory in Lean 4: Empirical Processes from Scratch

February 2, 2026
作者: Yuanhe Zhang, Jason D. Lee, Fanghui Liu
cs.AI

摘要

我们首次基于经验过程理论,在Lean 4中构建了统计学习理论(SLT)的完整形式化体系。该端到端的形式化基础设施填补了最新Lean 4 Mathlib库的空白,包含高斯利普希茨集中性的完整推导、次高斯过程杜德利熵积分定理的首个形式化证明,以及应用于(稀疏)最小二乘回归的尖锐收敛率分析。项目采用人机协同工作流完成,由人类设计证明策略,AI智能体执行战术级证明构建,最终形成经过人工验证的SLT工具箱。除实现外,形式化过程还揭示并修正了标准SLT教材中的隐含假设与缺失细节,推动了对理论逐行粒度的深入理解。本研究奠定了可复用的形式化基础,为机器学习理论的未来发展开辟了新路径。代码已开源:https://github.com/YuanheZ/lean-stat-learning-theory
English
We present the first comprehensive Lean 4 formalization of statistical learning theory (SLT) grounded in empirical process theory. Our end-to-end formal infrastructure implement the missing contents in latest Lean 4 Mathlib library, including a complete development of Gaussian Lipschitz concentration, the first formalization of Dudley's entropy integral theorem for sub-Gaussian processes, and an application to least-squares (sparse) regression with a sharp rate. The project was carried out using a human-AI collaborative workflow, in which humans design proof strategies and AI agents execute tactical proof construction, leading to the human-verified Lean 4 toolbox for SLT. Beyond implementation, the formalization process exposes and resolves implicit assumptions and missing details in standard SLT textbooks, enforcing a granular, line-by-line understanding of the theory. This work establishes a reusable formal foundation and opens the door for future developments in machine learning theory. The code is available at https://github.com/YuanheZ/lean-stat-learning-theory
PDF02February 11, 2026