Pico-Banana-400K:面向文本引导图像编辑的大规模数据集
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing
October 22, 2025
作者: Yusu Qian, Eli Bocek-Rivele, Liangchen Song, Jialing Tong, Yinfei Yang, Jiasen Lu, Wenze Hu, Zhe Gan
cs.AI
摘要
近期,多模态模型的进展展示了卓越的文本引导图像编辑能力,诸如GPT-4o和Nano-Banana等系统设立了新的标杆。然而,研究界的进步仍受限于缺乏大规模、高质量且公开可访问的真实图像构建的数据集。我们推出了Pico-Banana-400K,一个包含40万张图像的综合性指令引导图像编辑数据集。该数据集通过利用Nano-Banana从OpenImages集合中的真实照片生成多样化的编辑对来构建。Pico-Banana-400K与以往合成数据集的不同之处在于我们对质量和多样性的系统性把控。我们采用细粒度的图像编辑分类法,确保全面覆盖编辑类型,同时通过基于MLLM的质量评分和精心筛选,保持内容的精确保留和指令的忠实性。除了单次编辑,Pico-Banana-400K还支持复杂编辑场景的研究。数据集包含三个专门子集:(1) 一个包含7.2万示例的多轮编辑集合,用于研究连续修改中的序列编辑、推理与规划;(2) 一个包含5.6万示例的偏好子集,用于对齐研究和奖励模型训练;(3) 配对的长期与短期编辑指令,用于开发指令重写和摘要能力。通过提供这一大规模、高质量且任务丰富的资源,Pico-Banana-400K为训练和评估下一代文本引导图像编辑模型奠定了坚实的基础。
English
Recent advances in multimodal models have demonstrated remarkable text-guided
image editing capabilities, with systems like GPT-4o and Nano-Banana setting
new benchmarks. However, the research community's progress remains constrained
by the absence of large-scale, high-quality, and openly accessible datasets
built from real images. We introduce Pico-Banana-400K, a comprehensive
400K-image dataset for instruction-based image editing. Our dataset is
constructed by leveraging Nano-Banana to generate diverse edit pairs from real
photographs in the OpenImages collection. What distinguishes Pico-Banana-400K
from previous synthetic datasets is our systematic approach to quality and
diversity. We employ a fine-grained image editing taxonomy to ensure
comprehensive coverage of edit types while maintaining precise content
preservation and instruction faithfulness through MLLM-based quality scoring
and careful curation. Beyond single turn editing, Pico-Banana-400K enables
research into complex editing scenarios. The dataset includes three specialized
subsets: (1) a 72K-example multi-turn collection for studying sequential
editing, reasoning, and planning across consecutive modifications; (2) a
56K-example preference subset for alignment research and reward model training;
and (3) paired long-short editing instructions for developing instruction
rewriting and summarization capabilities. By providing this large-scale,
high-quality, and task-rich resource, Pico-Banana-400K establishes a robust
foundation for training and benchmarking the next generation of text-guided
image editing models.