LychSim：視覚研究のための制御可能かつインタラクティブなシミュレーションフレームワーク

要旨

自己教師あり事前学習により視覚システムの合成データへの依存度は低減されたものの、閉ループ最適化や厳密な分布外（OOD）評価において、シミュレーションは依然として不可欠なツールである。しかし、現代のシミュレーションプラットフォームは、しばしば高い技術的障壁を伴い、コンピュータグラフィックスやゲーム開発に関する広範な専門知識を必要とする。本研究では、このギャップを埋めるためにUnreal Engine 5上に構築された、高度に制御可能かつインタラクティブなシミュレーションフレームワーク「LychSim」を提案する。LychSimは以下の3つの主要な設計に基づいている。（1）基盤となるエンジンの複雑性を抽象化する、合理化されたPython API、（2）多様な分布外（OOD）視覚的課題を伴う高忠実度環境を生成可能であり、豊富な2D・3Dグラウンドトゥルースと組み合わせたプロシージャルデータパイプライン、（3）シミュレータを推論エージェント型LLMのための動的な閉ループプレイグラウンドに変える、Model Context Protocol（MCP）のネイティブ統合。さらに、意味的に整合した3Dグラウンドトゥルースと自動化されたシーン修正を可能にするため、シーンレベルのプロシージャルルールとオブジェクトレベルのポーズアライメントを注釈付けする。我々はLychSimの能力を、合成データエンジンとしての利用、強化学習に基づく敵対的評価機構の実現、インタラクティブで言語駆動型のシーンレイアウト生成の促進など、複数の下流アプリケーションにおいて実証する。より広範なビジョンコミュニティへの貢献として、LychSimの完全なソースコードと各種データアノテーションを含め、公開する予定である。

English

While self-supervised pretraining has reduced vision systems' reliance on synthetic data, simulation remains an indispensable tool for closed-loop optimization and rigorous out-of-distribution (OOD) evaluation. However, modern simulation platforms often present steep technical barriers, requiring extensive expertise in computer graphics and game development. In this work, we present LychSim, a highly controllable and interactive simulation framework built upon Unreal Engine 5 to bridge this gap. LychSim is built around three key designs: (1) a streamlined Python API that abstracts away underlying engine complexities; (2) a procedural data pipeline capable of generating diverse, high-fidelity environments with varying out-of-distribution (OOD) visual challenges, paired with rich 2D and 3D ground truths; and (3) a native integration of the Model Context Protocol (MCP) that transforms the simulator into a dynamic, closed-loop playground for reasoning agentic LLMs. We further annotate scene-level procedural rules and object-level pose alignments to enable semantically aligned 3D ground truths and automated scene modification. We demonstrate LychSim's capability across multiple downstream applications, including serving as a synthetic data engine, powering reinforcement learning-based adversarial examiners, and facilitating interactive, language-driven scene layout generation. To benefit the broader vision community, LychSim will be made publicly available, including full source code and various data annotations.

LychSim：視覚研究のための制御可能かつインタラクティブなシミュレーションフレームワーク

LychSim: A Controllable and Interactive Simulation Framework for Vision Research

要旨

Support