ChatPaper.aiChatPaper

Inferix:基于块扩散的新一代世界模拟推理引擎

Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation

November 25, 2025
作者: Inferix Team, Tianyu Feng, Yizeng Han, Jiahao He, Yuanyu He, Xi Lin, Teng Liu, Hanfeng Lu, Jiasheng Tang, Wei Wang, Zhiyuan Wang, Jichao Wu, Mingyang Yang, Yinghao Yu, Zeyu Zhang, Bohan Zhuang
cs.AI

摘要

世界模型是智能体AI、具身AI及游戏等领域的核心模拟器,能够生成具备物理真实性、可交互的高质量长视频。更重要的是,扩展这些模型有望激发视觉感知、理解与推理的涌现能力,为突破当前以LLM为中心的视觉基础模型开辟新范式。实现这一突破的关键在于半自回归(块扩散)解码范式,该范式通过分块应用扩散生成视频令牌,同时以先前块为条件,融合了扩散方法与自回归方法的优势,从而产生更连贯稳定的视频序列。尤为关键的是,该技术通过重新引入LLM风格的KV缓存管理机制,克服了标准视频扩散模型的局限性,实现了高效、可变长度的高质量生成。 因此,Inferix被专门设计为新一代推理引擎,通过优化的半自回归解码流程实现沉浸式世界合成。这种对世界模拟的专注定位,使其明显区别于面向高并发场景的系统(如vLLM或SGLang)以及经典视频扩散模型(如xDiTs)。Inferix进一步通过交互式视频流与性能分析功能增强其实用性,支持实时交互与逼真模拟,从而精准刻画世界动态。此外,通过无缝集成LV-Bench——专为分钟级长视频生成场景设计的细粒度评估基准,该系统支持高效性能评测。我们期待社区携手推进Inferix发展,共同推动世界模型的探索进程。
English
World models serve as core simulators for fields such as agentic AI, embodied AI, and gaming, capable of generating long, physically realistic, and interactive high-quality videos. Moreover, scaling these models could unlock emergent capabilities in visual perception, understanding, and reasoning, paving the way for a new paradigm that moves beyond current LLM-centric vision foundation models. A key breakthrough empowering them is the semi-autoregressive (block-diffusion) decoding paradigm, which merges the strengths of diffusion and autoregressive methods by generating video tokens in block-applying diffusion within each block while conditioning on previous ones, resulting in more coherent and stable video sequences. Crucially, it overcomes limitations of standard video diffusion by reintroducing LLM-style KV Cache management, enabling efficient, variable-length, and high-quality generation. Therefore, Inferix is specifically designed as a next-generation inference engine to enable immersive world synthesis through optimized semi-autoregressive decoding processes. This dedicated focus on world simulation distinctly sets it apart from systems engineered for high-concurrency scenarios (like vLLM or SGLang) and from classic video diffusion models (such as xDiTs). Inferix further enhances its offering with interactive video streaming and profiling, enabling real-time interaction and realistic simulation to accurately model world dynamics. Additionally, it supports efficient benchmarking through seamless integration of LV-Bench, a new fine-grained evaluation benchmark tailored for minute-long video generation scenarios. We hope the community will work together to advance Inferix and foster world model exploration.
PDF432December 1, 2025