GENIUS:生成式流体智能评估套件
GENIUS: Generative Fluid Intelligence Evaluation Suite
February 11, 2026
作者: Ruichuan An, Sihan Yang, Ziyu Guo, Wei Dai, Zijun Shen, Haodong Li, Renrui Zhang, Xinyu Wei, Guopeng Li, Wenshan Wu, Wentao Zhang
cs.AI
摘要
统一多模态模型(UMMs)在视觉生成领域展现出显著进展。然而,现有基准主要评估依赖知识记忆与学习模式调用的晶体智力,这种侧重忽视了生成式流体智力(GFI)——即实时归纳规律、约束推理及适应新场景的能力。为系统评估该能力,我们提出GENIUS(生成式流体智力评估套件),将GFI形式化为三大核心能力的综合体现:隐式模式归纳(如推断个性化视觉偏好)、即时约束执行(如可视化抽象隐喻)以及情境知识适应(如模拟反直觉物理现象)。这些能力共同要求模型完全基于即时情境解决问题。通过对12个代表性模型的系统评估,我们发现其在上述任务中存在显著性能缺陷。关键的是,诊断分析揭示了这些失败模式源于有限的情境理解能力,而非内在生成能力不足。为弥合此差距,我们提出无需训练的注意力干预策略。最终,GENIUS为GFI建立了严谨的评估标准,推动研究领域从知识利用转向动态通用推理。我们的数据集与代码将发布于:https://github.com/arctanxarc/GENIUS。
English
Unified Multimodal Models (UMMs) have shown remarkable progress in visual generation. Yet, existing benchmarks predominantly assess Crystallized Intelligence, which relies on recalling accumulated knowledge and learned schemas. This focus overlooks Generative Fluid Intelligence (GFI): the capacity to induce patterns, reason through constraints, and adapt to novel scenarios on the fly. To rigorously assess this capability, we introduce GENIUS (GEN Fluid Intelligence EvalUation Suite). We formalize GFI as a synthesis of three primitives. These include Inducing Implicit Patterns (e.g., inferring personalized visual preferences), Executing Ad-hoc Constraints (e.g., visualizing abstract metaphors), and Adapting to Contextual Knowledge (e.g., simulating counter-intuitive physics). Collectively, these primitives challenge models to solve problems grounded entirely in the immediate context. Our systematic evaluation of 12 representative models reveals significant performance deficits in these tasks. Crucially, our diagnostic analysis disentangles these failure modes. It demonstrates that deficits stem from limited context comprehension rather than insufficient intrinsic generative capability. To bridge this gap, we propose a training-free attention intervention strategy. Ultimately, GENIUS establishes a rigorous standard for GFI, guiding the field beyond knowledge utilization toward dynamic, general-purpose reasoning. Our dataset and code will be released at: https://github.com/arctanxarc/GENIUS{https://github.com/arctanxarc/GENIUS}.