ロボットの基盤モデルを監査するための具体的なレッドチーミング

要旨

言語によって制御されたロボットモデルには、自然言語の指示に基づいて幅広いタスクを実行させる可能性があります。ただし、その安全性と効果を評価することは依然として困難であり、単一のタスクが表現されるさまざまな方法をすべてテストすることは難しいためです。現在のベンチマークには2つの主要な制限があります。それらは限られた人間によって生成された指示に依存しており、多くの困難なケースを見落としており、損傷を回避するなどの安全性を評価せずに、タスクのパフォーマンスにのみ焦点を当てています。これらのギャップに対処するために、私たちはEmbodied Red Teaming（ERT）という新しい評価方法を導入します。ERTは、これらのモデルをテストするために多様で困難な指示を生成するために、自動化されたレッドチーム技術とビジョン言語モデル（VLMs）を使用します。実験結果は、最先端の言語によって制御されたロボットモデルがERTで生成された指示で失敗したり、安全でない振る舞いをしたりすることを示し、現在のベンチマークが実世界のパフォーマンスと安全性を評価する際の欠点を強調しています。コードとビデオは以下で入手可能です：https://s-karnik.github.io/embodied-red-team-project-page。

English

Language-conditioned robot models have the potential to enable robots to perform a wide range of tasks based on natural language instructions. However, assessing their safety and effectiveness remains challenging because it is difficult to test all the different ways a single task can be phrased. Current benchmarks have two key limitations: they rely on a limited set of human-generated instructions, missing many challenging cases, and focus only on task performance without assessing safety, such as avoiding damage. To address these gaps, we introduce Embodied Red Teaming (ERT), a new evaluation method that generates diverse and challenging instructions to test these models. ERT uses automated red teaming techniques with Vision Language Models (VLMs) to create contextually grounded, difficult instructions. Experimental results show that state-of-the-art language-conditioned robot models fail or behave unsafely on ERT-generated instructions, underscoring the shortcomings of current benchmarks in evaluating real-world performance and safety. Code and videos are available at: https://s-karnik.github.io/embodied-red-team-project-page.

ロボットの基盤モデルを監査するための具体的なレッドチーミング

Embodied Red Teaming for Auditing Robotic Foundation Models

要旨

Support