理解基于动作量化的行为克隆技术

摘要

行为克隆是机器学习中的基础范式，能够通过专家示范实现从机器人学、自动驾驶到生成模型等领域的策略学习。自回归模型（如Transformer）已被证明具有显著效力，其应用范围从大语言模型延伸至视觉-语言-动作系统。然而，将自回归模型应用于连续控制任务时，需通过量化对动作进行离散化处理——这一做法虽被广泛采用，但其理论机制尚未得到充分阐释。本文为该实践建立了理论基础：我们分析了量化误差如何沿时间轴传播并与统计样本复杂度相互作用，证明了在系统动态稳定且策略满足概率平滑条件的前提下，采用量化动作和对数损失的行为克隆能达到最优样本复杂度（与现有下界匹配），且量化误差仅引起多项式级别的时间轴依赖。我们进一步揭示了不同量化方案满足或违反这些条件的情形，并提出一种基于模型的增强方法，可在不要求策略平滑性的情况下可证明地改善误差界。最后，我们建立了能同时捕捉量化误差与统计复杂度影响的基本极限。

English

Behavior cloning is a fundamental paradigm in machine learning, enabling policy learning from expert demonstrations across robotics, autonomous driving, and generative models. Autoregressive models like transformer have proven remarkably effective, from large language models (LLMs) to vision-language-action systems (VLAs). However, applying autoregressive models to continuous control requires discretizing actions through quantization, a practice widely adopted yet poorly understood theoretically. This paper provides theoretical foundations for this practice. We analyze how quantization error propagates along the horizon and interacts with statistical sample complexity. We show that behavior cloning with quantized actions and log-loss achieves optimal sample complexity, matching existing lower bounds, and incurs only polynomial horizon dependence on quantization error, provided the dynamics are stable and the policy satisfies a probabilistic smoothness condition. We further characterize when different quantization schemes satisfy or violate these requirements, and propose a model-based augmentation that provably improves the error bound without requiring policy smoothness. Finally, we establish fundamental limits that jointly capture the effects of quantization error and statistical complexity.

理解基于动作量化的行为克隆技术

Understanding Behavior Cloning with Action Quantization

摘要

Support