Found-RL: door foundation models versterkte reinforcement learning voor autonoom rijden

Samenvatting

Reinforcement Learning (RL) is naar voren gekomen als een dominant paradigma voor end-to-end autonoom rijden (AD). RL kampt echter met een gebrek aan sample-efficiëntie en een tekort aan semantische interpreteerbaarheid in complexe scenario's. Foundation Models, met name Vision-Language Models (VLM's), kunnen dit verhelpen door rijke, contextbewuste kennis te bieden, maar hun hoge inferentielatentie belemmert de inzet in hoogfrequente RL-trainingslussen. Om deze kloof te overbruggen, presenteren wij Found-RL, een platform dat is toegesneden op het efficiënt verbeteren van RL voor AD met behulp van foundation models. Een kerninnovatie is het asynchrone batch-inferentiekader, dat de zware VLM-redenering ontkoppelt van de simulatielus, waardoor latentieknelpunten effectief worden opgelost om real-time leren mogelijk te maken. Wij introduceren diverse supervisiemechanismen: Value-Margin Regularization (VMR) en Advantage-Weighted Action Guidance (AWAG) om expert-achtige VLM-actievoorstellen effectief te distilleren in het RL-beleid. Daarnaast adopteren wij high-throughput CLIP voor dense reward shaping. Wij adresseren CLIP's dynamische blindheid via Conditional Contrastive Action Alignment, waarbij prompts worden geconditioneerd op gediscretiseerde snelheid/opdracht en een genormaliseerde, op marge gebaseerde bonus oplevert uit context-specifieke actie-anker scoring. Found-RL biedt een end-to-end pipeline voor de integratie van gefinetunede VLM's en toont aan dat een lichtgewicht RL-model een prestatie kan bereiken die bijna gelijk is aan die van VLM's met miljarden parameters, terwijl het real-time inferentie handhaaft (ongeveer 500 FPS). Code, data en modellen zullen openbaar beschikbaar worden gesteld op https://github.com/ys-qu/found-rl.

English

Reinforcement Learning (RL) has emerged as a dominant paradigm for end-to-end autonomous driving (AD). However, RL suffers from sample inefficiency and a lack of semantic interpretability in complex scenarios. Foundation Models, particularly Vision-Language Models (VLMs), can mitigate this by offering rich, context-aware knowledge, yet their high inference latency hinders deployment in high-frequency RL training loops. To bridge this gap, we present Found-RL, a platform tailored to efficiently enhance RL for AD using foundation models. A core innovation is the asynchronous batch inference framework, which decouples heavy VLM reasoning from the simulation loop, effectively resolving latency bottlenecks to support real-time learning. We introduce diverse supervision mechanisms: Value-Margin Regularization (VMR) and Advantage-Weighted Action Guidance (AWAG) to effectively distill expert-like VLM action suggestions into the RL policy. Additionally, we adopt high-throughput CLIP for dense reward shaping. We address CLIP's dynamic blindness via Conditional Contrastive Action Alignment, which conditions prompts on discretized speed/command and yields a normalized, margin-based bonus from context-specific action-anchor scoring. Found-RL provides an end-to-end pipeline for fine-tuned VLM integration and shows that a lightweight RL model can achieve near-VLM performance compared with billion-parameter VLMs while sustaining real-time inference (approx. 500 FPS). Code, data, and models will be publicly available at https://github.com/ys-qu/found-rl.

Found-RL: door foundation models versterkte reinforcement learning voor autonoom rijden

Found-RL: foundation model-enhanced reinforcement learning for autonomous driving

Samenvatting

Support