ChatPaper.aiChatPaper

GGBench:面向統一多模態模型的幾何生成推理基準

GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models

November 14, 2025
作者: Jingxuan Wei, Caijun Jia, Xi Bai, Xinglong Xu, Siyuan Li, Linzhuang Sun, Bihui Yu, Conghui He, Lijun Wu, Cheng Tan
cs.AI

摘要

統一多模態模型的出現標誌著人工智慧領域的範式轉變,從被動感知邁向主動的跨模態生成。儘管其具備前所未有的資訊合成能力,評估體系仍存在關鍵缺口:現有基準主要分別評估判別性理解或無約束圖像生成,未能衡量生成式推理的整合認知過程。為彌合這一缺口,我們提出幾何建構可作為理想測試平台,因其本質上需要語言理解與精確視覺生成的融合。我們推出專為評估幾何生成推理能力設計的基準測試GGBench,該框架能系統性診斷模型不僅理解推理、更能主動建構解決方案的能力,從而為新一代智慧系統設立更嚴格的標準。項目網站:https://opendatalab-raiser.github.io/GGBench/。
English
The advent of Unified Multimodal Models (UMMs) signals a paradigm shift in artificial intelligence, moving from passive perception to active, cross-modal generation. Despite their unprecedented ability to synthesize information, a critical gap persists in evaluation: existing benchmarks primarily assess discriminative understanding or unconstrained image generation separately, failing to measure the integrated cognitive process of generative reasoning. To bridge this gap, we propose that geometric construction provides an ideal testbed as it inherently demands a fusion of language comprehension and precise visual generation. We introduce GGBench, a benchmark designed specifically to evaluate geometric generative reasoning. It provides a comprehensive framework for systematically diagnosing a model's ability to not only understand and reason but to actively construct a solution, thereby setting a more rigorous standard for the next generation of intelligent systems. Project website: https://opendatalab-raiser.github.io/GGBench/.
PDF312December 1, 2025