GENIUS: Generatieve Evaluatiesuite voor Vloeiende Intelligentie

Samenvatting

Unified Multimodal Models (UMM's) hebben een opmerkelijke vooruitgang geboekt in visuele generatie. Toch beoordelen bestaande benchmarks voornamelijk Gekristalliseerde Intelligentie, die steunt op het oproepen van opgebouwde kennis en aangeleerde schema's. Deze focus laat Generatieve Fluïde Intelligentie (GFI) buiten beschouwing: het vermogen om patronen af te leiden, te redeneren met beperkingen en zich ter plekke aan te passen aan nieuwe scenario's. Om dit vermogen rigoureus te beoordelen, introduceren we GENIUS (GEN Fluid Intelligence EvalUation Suite). We formaliseren GFI als een synthese van drie primitieven. Deze omvatten het Afleiden van Impliciete Patronen (bijv. het afleiden van gepersonaliseerde visuele voorkeuren), het Uitvoeren van Ad-hoc Beperkingen (bijv. het visualiseren van abstracte metaforen) en het Aanpassen aan Contextuele Kennis (bijv. het simuleren van contra-intuïtieve fysica). Gezamenlijk dagen deze primitieven modellen uit om problemen op te lossen die volledig zijn geworteld in de directe context. Onze systematische evaluatie van 12 representatieve modellen onthult significante prestatieachterstanden bij deze taken. Cruciaal is dat onze diagnostische analyse deze faalwijzen ontrafelt. Het toont aan dat de tekortkomingen voortkomen uit beperkt contextbegrip in plaats van onvoldoende intrinsiek generatief vermogen. Om deze kloof te overbruggen, stellen we een trainingsvrije aandacht-interventiestrategie voor. Uiteindelijk stelt GENIUS een rigoureuze standaard voor GFI, en leidt het het veld voorbij kennisbenutting naar dynamisch, algemeen toepasbaar redeneren. Onze dataset en code worden vrijgegeven op: https://github.com/arctanxarc/GENIUS{https://github.com/arctanxarc/GENIUS}.

English

Unified Multimodal Models (UMMs) have shown remarkable progress in visual generation. Yet, existing benchmarks predominantly assess Crystallized Intelligence, which relies on recalling accumulated knowledge and learned schemas. This focus overlooks Generative Fluid Intelligence (GFI): the capacity to induce patterns, reason through constraints, and adapt to novel scenarios on the fly. To rigorously assess this capability, we introduce GENIUS (GEN Fluid Intelligence EvalUation Suite). We formalize GFI as a synthesis of three primitives. These include Inducing Implicit Patterns (e.g., inferring personalized visual preferences), Executing Ad-hoc Constraints (e.g., visualizing abstract metaphors), and Adapting to Contextual Knowledge (e.g., simulating counter-intuitive physics). Collectively, these primitives challenge models to solve problems grounded entirely in the immediate context. Our systematic evaluation of 12 representative models reveals significant performance deficits in these tasks. Crucially, our diagnostic analysis disentangles these failure modes. It demonstrates that deficits stem from limited context comprehension rather than insufficient intrinsic generative capability. To bridge this gap, we propose a training-free attention intervention strategy. Ultimately, GENIUS establishes a rigorous standard for GFI, guiding the field beyond knowledge utilization toward dynamic, general-purpose reasoning. Our dataset and code will be released at: https://github.com/arctanxarc/GENIUS{https://github.com/arctanxarc/GENIUS}.

GENIUS: Generatieve Evaluatiesuite voor Vloeiende Intelligentie

GENIUS: Generative Fluid Intelligence Evaluation Suite

Samenvatting

Support