액션 양자화를 통한 행동 복제 이해하기

초록

행동 복제는 로봇공학, 자율주행, 생성 모델에 이르기까지 전문가 시연 데이터로부터 정책 학습을 가능하게 하는 기계 학습의 기본 패러다임입니다. 트랜스포머와 같은 자기회귀 모델은 대규모 언어 모델(LLM)부터 시각-언어-행동 시스템(VLA)에 이르기까지 매우 효과적인 것으로 입증되었습니다. 그러나 자기회귀 모델을 연속 제어에 적용하려면 양자화를 통한 행동 이산화가 필요하며, 이는 널리 채택되었으나 이론적으로는 제대로 이해되지 못한 관행입니다. 본 논문은 이러한 관행에 대한 이론적 기반을 제공합니다. 우리는 양자화 오류가 시간 지평을 따라 어떻게 전파되고 통계적 표본 복잡도와 상호작용하는지 분석합니다. 동역학이 안정적이고 정책이 확률적 평활성 조건을 만족하는 경우, 양자화된 행동과 로그 손실을 사용한 행동 복제가 기존 하한과 일치하는 최적의 표본 복잡도를 달성하며 양자화 오류에 대한 시간 지평 의존성이 다항식 수준에 그친다는 것을 보여줍니다. 또한 우리는 서로 다른 양자화 방식이 이러한 요구사항을 언제 충족하거나 위반하는지 규명하고, 정책 평활성을 요구하지 않으면서 오류 한계를 개선할 수 있음을 증명 가능한 모델 기반 증강 기법을 제안합니다. 마지막으로, 양자화 오류와 통계적 복잡도의 영향을 함께 포착하는 근본적 한계를 규명합니다.

English

Behavior cloning is a fundamental paradigm in machine learning, enabling policy learning from expert demonstrations across robotics, autonomous driving, and generative models. Autoregressive models like transformer have proven remarkably effective, from large language models (LLMs) to vision-language-action systems (VLAs). However, applying autoregressive models to continuous control requires discretizing actions through quantization, a practice widely adopted yet poorly understood theoretically. This paper provides theoretical foundations for this practice. We analyze how quantization error propagates along the horizon and interacts with statistical sample complexity. We show that behavior cloning with quantized actions and log-loss achieves optimal sample complexity, matching existing lower bounds, and incurs only polynomial horizon dependence on quantization error, provided the dynamics are stable and the policy satisfies a probabilistic smoothness condition. We further characterize when different quantization schemes satisfy or violate these requirements, and propose a model-based augmentation that provably improves the error bound without requiring policy smoothness. Finally, we establish fundamental limits that jointly capture the effects of quantization error and statistical complexity.

액션 양자화를 통한 행동 복제 이해하기

Understanding Behavior Cloning with Action Quantization

초록

Support