KAGE-Bench：面向强化学习的快速已知轴视觉泛化评估基准

摘要

基于像素的强化学习智能体在遭遇纯粹视觉分布偏移时常常失效，即使潜在动态和奖励机制保持不变。然而现有基准测试往往混杂多种偏移源，阻碍了系统性分析。我们推出KAGE-Env——基于JAX的二维平台游戏环境，其将观测过程分解为可独立控制的视觉维度，同时保持底层控制问题不变。通过结构设计，改变视觉维度仅会通过像素策略引发的状态条件动作分布影响性能，为视觉泛化提供了清晰的抽象框架。基于该环境，我们构建了KAGE-Bench基准测试集，包含6个已知维度套件的34组训练-评估配置对，可分离单一视觉偏移效应。采用标准PPO-CNN基线测试时，我们观察到显著的维度相关性失效：背景和光度偏移常导致任务完全失败，而智能体外观偏移的影响相对较小。某些偏移在保持前进运动的同时破坏任务完成度，表明仅凭回报值可能掩盖泛化失败。该全向量化JAX实现在单GPU上可达每秒3300万环境步数，能快速实现视觉因子的可复现扫描。代码地址：https://avanturist322.github.io/KAGEBench/。

English

Pixel-based reinforcement learning agents often fail under purely visual distribution shift even when latent dynamics and rewards are unchanged, but existing benchmarks entangle multiple sources of shift and hinder systematic analysis. We introduce KAGE-Env, a JAX-native 2D platformer that factorizes the observation process into independently controllable visual axes while keeping the underlying control problem fixed. By construction, varying a visual axis affects performance only through the induced state-conditional action distribution of a pixel policy, providing a clean abstraction for visual generalization. Building on this environment, we define KAGE-Bench, a benchmark of six known-axis suites comprising 34 train-evaluation configuration pairs that isolate individual visual shifts. Using a standard PPO-CNN baseline, we observe strong axis-dependent failures, with background and photometric shifts often collapsing success, while agent-appearance shifts are comparatively benign. Several shifts preserve forward motion while breaking task completion, showing that return alone can obscure generalization failures. Finally, the fully vectorized JAX implementation enables up to 33M environment steps per second on a single GPU, enabling fast and reproducible sweeps over visual factors. Code: https://avanturist322.github.io/KAGEBench/.