OpenGVL - Benchmarking van visuele temporele voortgang voor datacuratie

Samenvatting

Dataschaarste blijft een van de meest beperkende factoren bij het bevorderen van vooruitgang in robotica. De hoeveelheid beschikbare robotica-data in het wild groeit echter exponentieel, wat nieuwe mogelijkheden creëert voor grootschalige datagebruik. Betrouwbare voorspelling van temporele taakvoltooiing zou kunnen helpen om deze data automatisch te annoteren en te cureren op grote schaal. De Generative Value Learning (GVL)-benadering is recentelijk voorgesteld, waarbij de kennis in vision-language models (VLMs) wordt benut om taakvoortgang te voorspellen op basis van visuele observaties. Op basis van GVL stellen we OpenGVL voor, een uitgebreide benchmark voor het schatten van taakvoortgang over diverse uitdagende manipulatietaken waarbij zowel robotische als menselijke belichamingen betrokken zijn. We evalueren de mogelijkheden van openbaar beschikbare open-source foundation-modellen en laten zien dat open-source modelfamilies aanzienlijk onderpresteren in vergelijking met closed-source tegenhangers, waarbij ze slechts ongeveer 70% van hun prestaties behalen op taken voor temporele voortgangsvoorspelling. Bovendien demonstreren we hoe OpenGVL kan dienen als een praktisch hulpmiddel voor geautomatiseerde datacuratie en -filtering, waardoor efficiënte kwaliteitsbeoordeling van grootschalige robotica-datasets mogelijk wordt. We publiceren de benchmark samen met de volledige codebase op github.com/budzianowski/opengvl{OpenGVL}.

English

Data scarcity remains one of the most limiting factors in driving progress in robotics. However, the amount of available robotics data in the wild is growing exponentially, creating new opportunities for large-scale data utilization. Reliable temporal task completion prediction could help automatically annotate and curate this data at scale. The Generative Value Learning (GVL) approach was recently proposed, leveraging the knowledge embedded in vision-language models (VLMs) to predict task progress from visual observations. Building upon GVL, we propose OpenGVL, a comprehensive benchmark for estimating task progress across diverse challenging manipulation tasks involving both robotic and human embodiments. We evaluate the capabilities of publicly available open-source foundation models, showing that open-source model families significantly underperform closed-source counterparts, achieving only approximately 70% of their performance on temporal progress prediction tasks. Furthermore, we demonstrate how OpenGVL can serve as a practical tool for automated data curation and filtering, enabling efficient quality assessment of large-scale robotics datasets. We release the benchmark along with the complete codebase at github.com/budzianowski/opengvl{OpenGVL}.

OpenGVL - Benchmarking van visuele temporele voortgang voor datacuratie

OpenGVL - Benchmarking Visual Temporal Progress for Data Curation

Samenvatting

Support