OpenGVL——视觉时序进展基准测试与数据精选

摘要

数据稀缺性仍然是制约机器人技术发展的主要瓶颈之一。然而，现实世界中可用的机器人数据正呈指数级增长，这为大规模数据利用创造了新的机遇。可靠的时间任务完成预测有助于自动标注和规模化整理这些数据。最近提出的生成式价值学习（GVL）方法，通过利用视觉-语言模型（VLMs）中嵌入的知识，从视觉观察中预测任务进度。在GVL的基础上，我们提出了OpenGVL，这是一个全面的基准测试，用于评估涉及机器人和人体操作的各种复杂任务中的任务进度。我们评估了公开可用的开源基础模型的能力，结果显示开源模型家族在时间进度预测任务上的表现显著落后于闭源模型，仅达到后者约70%的性能。此外，我们展示了OpenGVL如何作为自动化数据整理和过滤的实用工具，实现对大规模机器人数据集的高效质量评估。我们在github.com/budzianowski/opengvl{OpenGVL}发布了该基准测试及其完整代码库。

English

Data scarcity remains one of the most limiting factors in driving progress in robotics. However, the amount of available robotics data in the wild is growing exponentially, creating new opportunities for large-scale data utilization. Reliable temporal task completion prediction could help automatically annotate and curate this data at scale. The Generative Value Learning (GVL) approach was recently proposed, leveraging the knowledge embedded in vision-language models (VLMs) to predict task progress from visual observations. Building upon GVL, we propose OpenGVL, a comprehensive benchmark for estimating task progress across diverse challenging manipulation tasks involving both robotic and human embodiments. We evaluate the capabilities of publicly available open-source foundation models, showing that open-source model families significantly underperform closed-source counterparts, achieving only approximately 70% of their performance on temporal progress prediction tasks. Furthermore, we demonstrate how OpenGVL can serve as a practical tool for automated data curation and filtering, enabling efficient quality assessment of large-scale robotics datasets. We release the benchmark along with the complete codebase at github.com/budzianowski/opengvl{OpenGVL}.

OpenGVL——视觉时序进展基准测试与数据精选

OpenGVL - Benchmarking Visual Temporal Progress for Data Curation

摘要

Support