프로로의 도약: 분포 수축 RL 파인튜닝을 통한 효율적 기술 숙달

초록

우리는 사전 학습된 생성 로봇 정책을 정제하기 위해 강화 학습(RL)을 "분포 수축" 연산자로 활용하는 Distribution Contractive Reinforcement Learning (DICE-RL) 프레임워크를 소개한다. DICE-RL은 온라인 피드백을 통해 높은 성공률을 보이는 행동을 증폭시켜 사전 학습된 행동 사전 분포를 고성능 "프로" 정책으로 전환한다. 우리는 광범위한 행동 범위를 확보하기 위해 확산 또는 흐름 기반 정책을 사전 학습한 후, 선택적 행동 정규화와 가치 기반 행동 선택을 결합한 안정적이고 샘플 효율적인 잔차 오프-폴리시 RL 프레임워크로 미세 조정한다. 광범위한 실험과 분석을 통해 DICE-RL이 강력한 안정성과 샘플 효율성을 바탕으로 성능을 안정적으로 향상시킴을 확인했다. 이 방법은 시뮬레이션과 실제 로봇에서 모두 고차원 픽셀 입력으로부터 복잡한 장기간 조작 기술의 숙달을 가능하게 한다. 프로젝트 웹사이트: https://zhanyisun.github.io/dice.rl.2026/.

English

We introduce Distribution Contractive Reinforcement Learning (DICE-RL), a framework that uses reinforcement learning (RL) as a "distribution contraction" operator to refine pretrained generative robot policies. DICE-RL turns a pretrained behavior prior into a high-performing "pro" policy by amplifying high-success behaviors from online feedback. We pretrain a diffusion- or flow-based policy for broad behavioral coverage, then finetune it with a stable, sample-efficient residual off-policy RL framework that combines selective behavior regularization with value-guided action selection. Extensive experiments and analyses show that DICE-RL reliably improves performance with strong stability and sample efficiency. It enables mastery of complex long-horizon manipulation skills directly from high-dimensional pixel inputs, both in simulation and on a real robot. Project website: https://zhanyisun.github.io/dice.rl.2026/.

프로로의 도약: 분포 수축 RL 파인튜닝을 통한 효율적 기술 숙달

From Prior to Pro: Efficient Skill Mastery via Distribution Contractive RL Finetuning

초록

Support