LychSim:一种用于视觉研究的可控交互仿真框架
LychSim: A Controllable and Interactive Simulation Framework for Vision Research
May 12, 2026
作者: Wufei Ma, Chloe Wang, Siyi Chen, Jiawei Peng, Patrick Li, Alan Yuille
cs.AI
摘要
尽管自监督预训练减少了视觉系统对合成数据的依赖,但仿真仍然是闭环优化和严格分布外(OOD)评估不可或缺的工具。然而,现代仿真平台通常设置较高的技术门槛,需要用户在计算机图形学和游戏开发方面具备深厚专业知识。本研究提出了LychSim,一个基于虚幻引擎5构建的高可控、交互式仿真框架,旨在弥合这一差距。LychSim围绕三个关键设计构建:(1) 精简的Python API,抽象化底层引擎的复杂性;(2) 程序化数据流水线,能够生成多样、高保真的环境,包含各种分布外视觉挑战,并配有丰富的2D和3D真值数据;(3) 原生集成模型上下文协议(MCP),将仿真器转化为用于推理型智能体大语言模型的动态闭环试验场。此外,我们标注了场景级程序规则和物体级姿态对齐,以实现语义对齐的3D真值数据及自动化场景修改。我们展示了LychSim在多个下游应用中的能力,包括作为合成数据引擎、支持基于强化学习的对抗性检查器,以及促进交互式、语言驱动的场景布局生成。为惠及更广泛的视觉研究社区,LychSim将公开提供,包含完整源代码及多种数据标注。
English
While self-supervised pretraining has reduced vision systems' reliance on synthetic data, simulation remains an indispensable tool for closed-loop optimization and rigorous out-of-distribution (OOD) evaluation. However, modern simulation platforms often present steep technical barriers, requiring extensive expertise in computer graphics and game development. In this work, we present LychSim, a highly controllable and interactive simulation framework built upon Unreal Engine 5 to bridge this gap. LychSim is built around three key designs: (1) a streamlined Python API that abstracts away underlying engine complexities; (2) a procedural data pipeline capable of generating diverse, high-fidelity environments with varying out-of-distribution (OOD) visual challenges, paired with rich 2D and 3D ground truths; and (3) a native integration of the Model Context Protocol (MCP) that transforms the simulator into a dynamic, closed-loop playground for reasoning agentic LLMs. We further annotate scene-level procedural rules and object-level pose alignments to enable semantically aligned 3D ground truths and automated scene modification. We demonstrate LychSim's capability across multiple downstream applications, including serving as a synthetic data engine, powering reinforcement learning-based adversarial examiners, and facilitating interactive, language-driven scene layout generation. To benefit the broader vision community, LychSim will be made publicly available, including full source code and various data annotations.