ChatPaper.aiChatPaper

Pico-Banana-400K:一個大規模的文本引導圖像編輯數據集

Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

October 22, 2025
作者: Yusu Qian, Eli Bocek-Rivele, Liangchen Song, Jialing Tong, Yinfei Yang, Jiasen Lu, Wenze Hu, Zhe Gan
cs.AI

摘要

近期多模態模型的進展展現了卓越的文本引導圖像編輯能力,如GPT-4o和Nano-Banana等系統設定了新的基準。然而,研究界的進步仍受限於缺乏大規模、高質量且開放可用的真實圖像數據集。我們推出了Pico-Banana-400K,這是一個包含40萬張圖像的指令型圖像編輯綜合數據集。該數據集通過利用Nano-Banana從OpenImages集合中的真實照片生成多樣化的編輯對來構建。Pico-Banana-400K與以往合成數據集的不同之處在於我們對質量和多樣性的系統化處理。我們採用細粒度的圖像編輯分類法,確保全面覆蓋編輯類型,同時通過基於MLLM的質量評分和精心策劃,保持內容的精確保留和指令的忠實性。除了單次編輯,Pico-Banana-400K還支持複雜編輯場景的研究。數據集包含三個專門子集:(1) 一個7.2萬例的多輪編輯集合,用於研究連續修改中的序列編輯、推理和規劃;(2) 一個5.6萬例的偏好子集,用於對齊研究和獎勵模型訓練;(3) 配對的長短編輯指令,用於開發指令重寫和摘要能力。通過提供這一規模龐大、質量上乘且任務豐富的資源,Pico-Banana-400K為訓練和評估下一代文本引導圖像編輯模型奠定了堅實基礎。
English
Recent advances in multimodal models have demonstrated remarkable text-guided image editing capabilities, with systems like GPT-4o and Nano-Banana setting new benchmarks. However, the research community's progress remains constrained by the absence of large-scale, high-quality, and openly accessible datasets built from real images. We introduce Pico-Banana-400K, a comprehensive 400K-image dataset for instruction-based image editing. Our dataset is constructed by leveraging Nano-Banana to generate diverse edit pairs from real photographs in the OpenImages collection. What distinguishes Pico-Banana-400K from previous synthetic datasets is our systematic approach to quality and diversity. We employ a fine-grained image editing taxonomy to ensure comprehensive coverage of edit types while maintaining precise content preservation and instruction faithfulness through MLLM-based quality scoring and careful curation. Beyond single turn editing, Pico-Banana-400K enables research into complex editing scenarios. The dataset includes three specialized subsets: (1) a 72K-example multi-turn collection for studying sequential editing, reasoning, and planning across consecutive modifications; (2) a 56K-example preference subset for alignment research and reward model training; and (3) paired long-short editing instructions for developing instruction rewriting and summarization capabilities. By providing this large-scale, high-quality, and task-rich resource, Pico-Banana-400K establishes a robust foundation for training and benchmarking the next generation of text-guided image editing models.
PDF111October 23, 2025