體現式紅隊作戰用於稽核機器人基礎模型
Embodied Red Teaming for Auditing Robotic Foundation Models
November 27, 2024
作者: Sathwik Karnik, Zhang-Wei Hong, Nishant Abhangi, Yen-Chen Lin, Tsun-Hsuan Wang, Christophe Dupuy, Rahul Gupta, Pulkit Agrawal
cs.AI
摘要
語言條件機器人模型具有潛力使機器人能夠根據自然語言指令執行各種任務。然而,評估其安全性和效能仍具挑戰性,因為測試單一任務可能被表達的各種方式是困難的。目前的基準具有兩個主要限制:它們依賴於有限的人類生成指令集,錯過了許多具挑戰性的情況,並且僅關注任務表現,而不評估安全性,例如避免損壞。為了解決這些缺陷,我們引入了具有具挑戰性且多樣化指令的新評估方法-具體化的紅隊行動(Embodied Red Teaming,ERT)。ERT利用自動化的紅隊行動技術與視覺語言模型(VLMs)生成具有情境基礎的困難指令。實驗結果顯示,最先進的語言條件機器人模型在ERT生成的指令上失敗或表現不安全,凸顯了目前基準在評估現實世界表現和安全性方面的缺陷。代碼和影片可在以下網址找到:https://s-karnik.github.io/embodied-red-team-project-page。
English
Language-conditioned robot models have the potential to enable robots to
perform a wide range of tasks based on natural language instructions. However,
assessing their safety and effectiveness remains challenging because it is
difficult to test all the different ways a single task can be phrased. Current
benchmarks have two key limitations: they rely on a limited set of
human-generated instructions, missing many challenging cases, and focus only on
task performance without assessing safety, such as avoiding damage. To address
these gaps, we introduce Embodied Red Teaming (ERT), a new evaluation method
that generates diverse and challenging instructions to test these models. ERT
uses automated red teaming techniques with Vision Language Models (VLMs) to
create contextually grounded, difficult instructions. Experimental results show
that state-of-the-art language-conditioned robot models fail or behave unsafely
on ERT-generated instructions, underscoring the shortcomings of current
benchmarks in evaluating real-world performance and safety. Code and videos are
available at: https://s-karnik.github.io/embodied-red-team-project-page.Summary
AI-Generated Summary