纳米香蕉Pro是低层级视觉全能选手吗？基于14项任务与40个数据集的全方位评估

摘要

文本到图像生成模型的快速发展已彻底改变视觉内容创作领域。尽管Nano Banana Pro等商业产品获得广泛关注，但其作为传统低层级视觉任务通用解决方案的潜力尚未得到充分探索。本研究致力于回答一个关键问题：Nano Banana Pro是否堪称低层级视觉全能选手？我们通过对14类不同低层级任务、涵盖40个多样化数据集进行全面的零样本评估，在未经微调的情况下使用简单文本提示，将Nano Banana Pro与最先进的专用模型进行基准测试。深入分析揭示出明显的性能二分现象：虽然Nano Banana Pro展现出卓越的主观视觉质量，经常能生成超越专用模型的合理高频细节，但在传统基于参考指标的定量评估中表现欠佳。我们将此差异归因于生成模型固有的随机性特性，使其难以维持传统指标所需的严格像素级一致性。本报告指出Nano Banana Pro是低层级视觉任务中具备潜力的零样本竞争者，但实现领域专用模型的高保真度仍是重大挑战。

English

The rapid evolution of text-to-image generation models has revolutionized visual content creation. While commercial products like Nano Banana Pro have garnered significant attention, their potential as generalist solvers for traditional low-level vision challenges remains largely underexplored. In this study, we investigate the critical question: Is Nano Banana Pro a Low-Level Vision All-Rounder? We conducted a comprehensive zero-shot evaluation across 14 distinct low-level tasks spanning 40 diverse datasets. By utilizing simple textual prompts without fine-tuning, we benchmarked Nano Banana Pro against state-of-the-art specialist models. Our extensive analysis reveals a distinct performance dichotomy: while Nano Banana Pro demonstrates superior subjective visual quality, often hallucinating plausible high-frequency details that surpass specialist models, it lags behind in traditional reference-based quantitative metrics. We attribute this discrepancy to the inherent stochasticity of generative models, which struggle to maintain the strict pixel-level consistency required by conventional metrics. This report identifies Nano Banana Pro as a capable zero-shot contender for low-level vision tasks, while highlighting that achieving the high fidelity of domain specialists remains a significant hurdle.