ChatPaper.aiChatPaper

GGBench:面向统一多模态模型的几何生成推理基准

GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models

November 14, 2025
作者: Jingxuan Wei, Caijun Jia, Xi Bai, Xinglong Xu, Siyuan Li, Linzhuang Sun, Bihui Yu, Conghui He, Lijun Wu, Cheng Tan
cs.AI

摘要

统一多模态模型(UMMs)的出现标志着人工智能领域的范式转变,从被动感知转向主动的跨模态生成。尽管这些模型具备前所未有的信息整合能力,但评估体系仍存在关键空白:现有基准主要分别评估判别式理解或无约束图像生成能力,未能有效衡量生成式推理的整合认知过程。为填补这一空白,我们提出几何构建可作为理想测试平台,因其本质上要求语言理解与精确视觉生成的融合。我们推出GGBench基准,专门用于评估几何生成推理能力。该基准提供系统化诊断框架,不仅能检验模型的理解与推理能力,更能评估其主动构建解决方案的能力,从而为新一代智能系统设立更严谨的标准。项目网站:https://opendatalab-raiser.github.io/GGBench/。
English
The advent of Unified Multimodal Models (UMMs) signals a paradigm shift in artificial intelligence, moving from passive perception to active, cross-modal generation. Despite their unprecedented ability to synthesize information, a critical gap persists in evaluation: existing benchmarks primarily assess discriminative understanding or unconstrained image generation separately, failing to measure the integrated cognitive process of generative reasoning. To bridge this gap, we propose that geometric construction provides an ideal testbed as it inherently demands a fusion of language comprehension and precise visual generation. We introduce GGBench, a benchmark designed specifically to evaluate geometric generative reasoning. It provides a comprehensive framework for systematically diagnosing a model's ability to not only understand and reason but to actively construct a solution, thereby setting a more rigorous standard for the next generation of intelligent systems. Project website: https://opendatalab-raiser.github.io/GGBench/.
PDF312December 1, 2025