LychSim:一個可控且互動式的視覺研究模擬框架
LychSim: A Controllable and Interactive Simulation Framework for Vision Research
May 12, 2026
作者: Wufei Ma, Chloe Wang, Siyi Chen, Jiawei Peng, Patrick Li, Alan Yuille
cs.AI
摘要
雖然自監督預訓練已降低視覺系統對合成數據的依賴,但模擬仍是閉環優化與嚴謹分布外(OOD)評估不可或缺的工具。然而,現代模擬平台常存在陡峭的技術門檻,需具備電腦圖學與遊戲開發的廣泛專業知識。本研究提出 LychSim,這套基於 Unreal Engine 5 的高度可控且互動式模擬框架,旨在填補此缺口。LychSim 圍繞三項關鍵設計建構:(1)簡潔的 Python API,抽象化底層引擎的複雜性;(2)程序化數據管道,能生成多樣化、高保真環境,並搭配各種分布外視覺挑戰,同時提供豐富的 2D 與 3D 真值;(3)原生整合模型上下文協定(MCP),將模擬器轉換為動態閉環遊樂場,供具推理能力的自主大型語言模型使用。我們進一步標註場景層級的程序化規則與物件層級的姿態對齊,以實現語義對齊的 3D 真值與自動化場景修改。我們展示 LychSim 在多種下游應用中的能力,包括作為合成數據引擎、驅動基於強化學習的對抗性檢查器,以及促進互動式語言驅動場景佈局生成。為惠及更廣泛的視覺社群,LychSim 將公開釋出,包含完整原始碼與多種數據標註。
English
While self-supervised pretraining has reduced vision systems' reliance on synthetic data, simulation remains an indispensable tool for closed-loop optimization and rigorous out-of-distribution (OOD) evaluation. However, modern simulation platforms often present steep technical barriers, requiring extensive expertise in computer graphics and game development. In this work, we present LychSim, a highly controllable and interactive simulation framework built upon Unreal Engine 5 to bridge this gap. LychSim is built around three key designs: (1) a streamlined Python API that abstracts away underlying engine complexities; (2) a procedural data pipeline capable of generating diverse, high-fidelity environments with varying out-of-distribution (OOD) visual challenges, paired with rich 2D and 3D ground truths; and (3) a native integration of the Model Context Protocol (MCP) that transforms the simulator into a dynamic, closed-loop playground for reasoning agentic LLMs. We further annotate scene-level procedural rules and object-level pose alignments to enable semantically aligned 3D ground truths and automated scene modification. We demonstrate LychSim's capability across multiple downstream applications, including serving as a synthetic data engine, powering reinforcement learning-based adversarial examiners, and facilitating interactive, language-driven scene layout generation. To benefit the broader vision community, LychSim will be made publicly available, including full source code and various data annotations.