GENIUS:生成式流體智力評估套件
GENIUS: Generative Fluid Intelligence Evaluation Suite
February 11, 2026
作者: Ruichuan An, Sihan Yang, Ziyu Guo, Wei Dai, Zijun Shen, Haodong Li, Renrui Zhang, Xinyu Wei, Guopeng Li, Wenshan Wu, Wentao Zhang
cs.AI
摘要
統一多模態模型(UMMs)在視覺生成領域展現了顯著進展。然而,現有基準主要評估的是依賴於累積知識與習得模式的「晶化智力」,這種側重忽略了「生成流體智力」(GFI)——即歸納模式、透過約束推理並即時適應新情境的能力。為嚴謹評估此能力,我們提出GENIUS(生成流體智力評估套件)。我們將GFI形式化為三種基礎能力的綜合體,包括:歸納隱性模式(例如推斷個人化視覺偏好)、執行臨時約束(例如可視化抽象隱喻),以及適應情境知識(例如模擬反直覺物理現象)。這些基礎能力共同挑戰模型在即時情境中解決問題的能力。我們對12個代表性模型的系統性評估顯示,這些任務存在明顯的性能缺陷。關鍵在於,我們的診斷分析釐清了這些失敗模式,證明缺陷源於情境理解能力不足,而非內在生成能力欠缺。為彌合此差距,我們提出無需訓練的注意力干預策略。最終,GENIUS為GFI建立了嚴謹標準,引領領域從知識運用邁向動態通用推理。我們的資料集與程式碼將發佈於:https://github.com/arctanxarc/GENIUS。
English
Unified Multimodal Models (UMMs) have shown remarkable progress in visual generation. Yet, existing benchmarks predominantly assess Crystallized Intelligence, which relies on recalling accumulated knowledge and learned schemas. This focus overlooks Generative Fluid Intelligence (GFI): the capacity to induce patterns, reason through constraints, and adapt to novel scenarios on the fly. To rigorously assess this capability, we introduce GENIUS (GEN Fluid Intelligence EvalUation Suite). We formalize GFI as a synthesis of three primitives. These include Inducing Implicit Patterns (e.g., inferring personalized visual preferences), Executing Ad-hoc Constraints (e.g., visualizing abstract metaphors), and Adapting to Contextual Knowledge (e.g., simulating counter-intuitive physics). Collectively, these primitives challenge models to solve problems grounded entirely in the immediate context. Our systematic evaluation of 12 representative models reveals significant performance deficits in these tasks. Crucially, our diagnostic analysis disentangles these failure modes. It demonstrates that deficits stem from limited context comprehension rather than insufficient intrinsic generative capability. To bridge this gap, we propose a training-free attention intervention strategy. Ultimately, GENIUS establishes a rigorous standard for GFI, guiding the field beyond knowledge utilization toward dynamic, general-purpose reasoning. Our dataset and code will be released at: https://github.com/arctanxarc/GENIUS{https://github.com/arctanxarc/GENIUS}.