ChatPaper.aiChatPaper

SCOPE:在可玩環境中模擬跨遊戲操作以建構FPS世界模型

SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models

May 22, 2026
作者: Zizhao Tong, Hongfeng Lai, Zeqing Wang, Zhaohu Xing, Kexu Cheng, Haoran Xu, Zhao Pu, Shangwen Zhu, Ruili Feng, Jian Zhao, Yan Zhang, Hao Tang, Yeying Jin, Ling Shao
cs.AI

摘要

針對第一人稱射擊(FPS)遊戲的互動式世界模型,必須在每一幀中解決高頻重疊控制訊號,同時不影響未受干擾的區域。現有方法將動作全域注入並在單一遊戲上訓練,因此在密集的FPS輸入下表現不佳。我們觀察到FPS動作具有空間選擇性:諸如開火或換彈等離散事件僅影響武器周圍的局部區域(即「瞄準範圍」),而連續的攝影機與移動訊號則控制穩定的周遭環境。我們提出SCOPE,該方法在預訓練影片擴散模型的每個Transformer區塊中插入條件調節模組,將特徵重塑為逐畫素的時間序列,使每個位置能根據局部視覺內容計算其動作響應。此方式無需分割標籤即可分離「範圍內」效果與「範圍外」生成。我們亦引入CrossFPS,這是首個具備幀對齊動作遙測資料的多遊戲FPS數據集,包含來自7款遊戲的69K片段與10自由度控制器訊號,並經過策劃以消除遊戲玩法偏差。該模型學習通用的視覺-動作映射,而非特定遊戲的模式,從而實現對未見過場景的零樣本遷移。實驗證實了其強大的動作響應能力、精確的範圍分離效果以及有效的跨遊戲泛化能力。
English
Interactive world models for first-person shooter (FPS) games must resolve high-frequency overlapping control signals at every frame without disrupting unaffected regions. Existing methods inject actions globally and train on single titles, failing under dense FPS inputs. We observe that FPS actions are spatially selective: discrete events such as firing or reloading affect only a localized region around the weapon (the scope), while continuous camera and movement signals govern stable surroundings. We propose SCOPE, which inserts a conditioning module into each transformer block of a pretrained video diffusion model. It reshapes features into per-pixel temporal sequences so that each position computes its action response from local visual content. This separates in-scope effects from out-of-scope generation without segmentation labels. We also introduce CrossFPS, the first multi-game FPS dataset with frame-aligned action telemetry. It comprises 69K clips from 7 titles with 10-DoF controller signals, curated to remove gameplay bias. The model learns general visual-to-action mappings rather than game-specific patterns, enabling zero-shot transfer to unseen scenes. Experiments confirm strong action responsiveness, precise scope separation, and effective cross-game generalization.