視覺語言模型作為獎勵來源
Vision-Language Models as a Source of Rewards
December 14, 2023
作者: Kate Baumli, Satinder Baveja, Feryal Behbahani, Harris Chan, Gheorghe Comanici, Sebastian Flennerhag, Maxime Gazeau, Kristian Holsheimer, Dan Horgan, Michael Laskin, Clare Lyle, Hussain Masoom, Kay McKinney, Volodymyr Mnih, Alexander Neitz, Fabio Pardo, Jack Parker-Holder, John Quan, Tim Rocktäschel, Himanshu Sahni, Tom Schaul, Yannick Schroecker, Stephen Spencer, Richie Steigerwald, Luyu Wang, Lei Zhang
cs.AI
摘要
在強化學習的研究前沿之一,是建立能夠在豐富且開放的環境中實現多個目標的通用智能體。建立具有強化學習的通用智能體的一個關鍵限制因素是需要大量的獎勵函數來實現不同的目標。我們研究了使用現成的視覺語言模型(VLMs)作為強化學習智能體的獎勵來源的可行性。我們展示了如何從 CLIP 模型系列中衍生視覺達成各種語言目標的獎勵,並用於訓練能夠實現多種語言目標的強化學習智能體。我們在兩個不同的視覺領域展示了這種方法,並呈現了一個規模化趨勢,顯示更大的 VLMs 導致更準確的視覺目標達成獎勵,進而產生更有能力的強化學習智能體。
English
Building generalist agents that can accomplish many goals in rich open-ended
environments is one of the research frontiers for reinforcement learning. A key
limiting factor for building generalist agents with RL has been the need for a
large number of reward functions for achieving different goals. We investigate
the feasibility of using off-the-shelf vision-language models, or VLMs, as
sources of rewards for reinforcement learning agents. We show how rewards for
visual achievement of a variety of language goals can be derived from the CLIP
family of models, and used to train RL agents that can achieve a variety of
language goals. We showcase this approach in two distinct visual domains and
present a scaling trend showing how larger VLMs lead to more accurate rewards
for visual goal achievement, which in turn produces more capable RL agents.