Pico-Banana-400K:一個大規模的文本引導圖像編輯數據集
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing
October 22, 2025
作者: Yusu Qian, Eli Bocek-Rivele, Liangchen Song, Jialing Tong, Yinfei Yang, Jiasen Lu, Wenze Hu, Zhe Gan
cs.AI
摘要
近期多模態模型的進展展現了卓越的文本引導圖像編輯能力,如GPT-4o和Nano-Banana等系統設定了新的基準。然而,研究界的進步仍受限於缺乏大規模、高質量且開放可用的真實圖像數據集。我們推出了Pico-Banana-400K,這是一個包含40萬張圖像的指令型圖像編輯綜合數據集。該數據集通過利用Nano-Banana從OpenImages集合中的真實照片生成多樣化的編輯對來構建。Pico-Banana-400K與以往合成數據集的不同之處在於我們對質量和多樣性的系統化處理。我們採用細粒度的圖像編輯分類法,確保全面覆蓋編輯類型,同時通過基於MLLM的質量評分和精心策劃,保持內容的精確保留和指令的忠實性。除了單次編輯,Pico-Banana-400K還支持複雜編輯場景的研究。數據集包含三個專門子集:(1) 一個7.2萬例的多輪編輯集合,用於研究連續修改中的序列編輯、推理和規劃;(2) 一個5.6萬例的偏好子集,用於對齊研究和獎勵模型訓練;(3) 配對的長短編輯指令,用於開發指令重寫和摘要能力。通過提供這一規模龐大、質量上乘且任務豐富的資源,Pico-Banana-400K為訓練和評估下一代文本引導圖像編輯模型奠定了堅實基礎。
English
Recent advances in multimodal models have demonstrated remarkable text-guided
image editing capabilities, with systems like GPT-4o and Nano-Banana setting
new benchmarks. However, the research community's progress remains constrained
by the absence of large-scale, high-quality, and openly accessible datasets
built from real images. We introduce Pico-Banana-400K, a comprehensive
400K-image dataset for instruction-based image editing. Our dataset is
constructed by leveraging Nano-Banana to generate diverse edit pairs from real
photographs in the OpenImages collection. What distinguishes Pico-Banana-400K
from previous synthetic datasets is our systematic approach to quality and
diversity. We employ a fine-grained image editing taxonomy to ensure
comprehensive coverage of edit types while maintaining precise content
preservation and instruction faithfulness through MLLM-based quality scoring
and careful curation. Beyond single turn editing, Pico-Banana-400K enables
research into complex editing scenarios. The dataset includes three specialized
subsets: (1) a 72K-example multi-turn collection for studying sequential
editing, reasoning, and planning across consecutive modifications; (2) a
56K-example preference subset for alignment research and reward model training;
and (3) paired long-short editing instructions for developing instruction
rewriting and summarization capabilities. By providing this large-scale,
high-quality, and task-rich resource, Pico-Banana-400K establishes a robust
foundation for training and benchmarking the next generation of text-guided
image editing models.