理解基于动作量化的行为克隆技术

摘要

行为克隆是机器学习中的基础范式，通过专家示范实现策略学习，广泛应用于机器人学、自动驾驶和生成模型领域。自回归模型（如Transformer）已被证明具有显著效力，从大语言模型到视觉-语言-动作系统均可见其应用。然而，将自回归模型应用于连续控制需通过量化对动作进行离散化处理，这种做法虽被广泛采用，但其理论机制尚未得到充分理解。本文为这一实践奠定了理论基础。我们分析了量化误差如何沿时间轴传播并与统计样本复杂度相互作用，证明了在动态系统稳定且策略满足概率平滑条件的前提下，采用量化动作和对数损失的行为克隆能达到最优样本复杂度，与现有下界匹配，且量化误差仅引发多项式级别的时间依赖。我们进一步揭示了不同量化方案满足或违反这些条件的情形，并提出一种基于模型的增强方法，可在无需策略平滑性的情况下可证明地改善误差边界。最后，我们建立了能同时捕捉量化误差与统计复杂度影响的基本极限。

English

Behavior cloning is a fundamental paradigm in machine learning, enabling policy learning from expert demonstrations across robotics, autonomous driving, and generative models. Autoregressive models like transformer have proven remarkably effective, from large language models (LLMs) to vision-language-action systems (VLAs). However, applying autoregressive models to continuous control requires discretizing actions through quantization, a practice widely adopted yet poorly understood theoretically. This paper provides theoretical foundations for this practice. We analyze how quantization error propagates along the horizon and interacts with statistical sample complexity. We show that behavior cloning with quantized actions and log-loss achieves optimal sample complexity, matching existing lower bounds, and incurs only polynomial horizon dependence on quantization error, provided the dynamics are stable and the policy satisfies a probabilistic smoothness condition. We further characterize when different quantization schemes satisfy or violate these requirements, and propose a model-based augmentation that provably improves the error bound without requiring policy smoothness. Finally, we establish fundamental limits that jointly capture the effects of quantization error and statistical complexity.