Code-as-Monitor：反応型および積極的ロボット障害検出のための制約を意識したビジュアルプログラミング

要旨

閉ループロボットシステムにおいて、オープンセットの障害の自動検出と予防は重要です。最近の研究では、予期せぬ障害をリアクティブに特定したり、予測可能な障害をプロアクティブに防止したりすることが難しいことがよくあります。このため、我々は「Code-as-Monitor（CaM）」という新しいパラダイムを提案します。このパラダイムは、ビジョン-言語モデル（VLM）を活用してオープンセットのリアクティブおよびプロアクティブな障害検出を行います。当方法の中核は、両方のタスクを統一された時空間制約充足問題のセットとして定式化し、VLMが生成したコードを使用してリアルタイムモニタリングを行うことです。モニタリングの精度と効率を向上させるために、制約関連のエンティティやそれらの部分をコンパクトな幾何学的要素に抽象化する制約要素をさらに導入します。このアプローチは、より一般的で、トラッキングを簡素化し、これらの要素を視覚的なプロンプトとして活用することで、制約に意識したビジュアルプログラミングを容易にします。実験結果によると、CaMは、3つのシミュレータと実世界の環境でのベースラインと比較して、激しい乱れがある場合において成功率が28.7%高く、実行時間が31.8%短縮されることが示されました。さらに、CaMはオープンループ制御ポリシーと統合して閉ループシステムを形成することができ、混雑したシーンやダイナミックな環境での長期タスクを可能にします。

English

Automatic detection and prevention of open-set failures are crucial in closed-loop robotic systems. Recent studies often struggle to simultaneously identify unexpected failures reactively after they occur and prevent foreseeable ones proactively. To this end, we propose Code-as-Monitor (CaM), a novel paradigm leveraging the vision-language model (VLM) for both open-set reactive and proactive failure detection. The core of our method is to formulate both tasks as a unified set of spatio-temporal constraint satisfaction problems and use VLM-generated code to evaluate them for real-time monitoring. To enhance the accuracy and efficiency of monitoring, we further introduce constraint elements that abstract constraint-related entities or their parts into compact geometric elements. This approach offers greater generality, simplifies tracking, and facilitates constraint-aware visual programming by leveraging these elements as visual prompts. Experiments show that CaM achieves a 28.7% higher success rate and reduces execution time by 31.8% under severe disturbances compared to baselines across three simulators and a real-world setting. Moreover, CaM can be integrated with open-loop control policies to form closed-loop systems, enabling long-horizon tasks in cluttered scenes with dynamic environments.

Code-as-Monitor：反応型および積極的ロボット障害検出のための制約を意識したビジュアルプログラミング

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

要旨

Support