ChatPaper.aiChatPaper

炼金术师:通过元梯度数据选择提升文生图模型训练效率

Alchemist: Unlocking Efficiency in Text-to-Image Model Training via Meta-Gradient Data Selection

December 18, 2025
作者: Kaixin Ding, Yang Zhou, Xi Chen, Miao Yang, Jiarong Ou, Rui Chen, Xin Tao, Hengshuang Zhao
cs.AI

摘要

近期,文本到图像生成模型(如Imagen、Stable Diffusion和FLUX)的技术突破显著提升了视觉生成质量。然而,这些模型的性能本质上受限于训练数据质量。网络爬取和合成图像数据集常包含低质量或冗余样本,导致视觉保真度下降、训练过程不稳定及计算效率低下。因此,有效的数据筛选对提升数据效率至关重要。现有方法依赖于高成本的人工筛选或基于单维度特征的启发式评分机制进行文本-图像数据过滤。尽管基于元学习的方法已在大型语言模型中有所探索,但尚未适配图像模态。为此,我们提出**Alchemist**——一个基于元梯度的框架,用于从大规模文本-图像对中筛选最优数据子集。该方法通过数据中心的迭代模型优化,自动学习评估每个样本的影响力。Alchemist包含两个核心阶段:数据评级与数据剪枝。我们训练轻量级评级器,基于梯度信息并融合多粒度感知来估计样本影响力,继而采用Shift-G采样策略筛选信息丰富的子集以提升模型训练效率。Alchemist是首个面向文本到图像模型训练的自动化、可扩展的元梯度数据筛选框架。在合成与网络爬取数据集上的实验表明,Alchemist能持续提升视觉质量与下游任务性能:使用其筛选的50%数据训练模型,效果可超越全数据集训练。
English
Recent advances in Text-to-Image (T2I) generative models, such as Imagen, Stable Diffusion, and FLUX, have led to remarkable improvements in visual quality. However, their performance is fundamentally limited by the quality of training data. Web-crawled and synthetic image datasets often contain low-quality or redundant samples, which lead to degraded visual fidelity, unstable training, and inefficient computation. Hence, effective data selection is crucial for improving data efficiency. Existing approaches rely on costly manual curation or heuristic scoring based on single-dimensional features in Text-to-Image data filtering. Although meta-learning based method has been explored in LLM, there is no adaptation for image modalities. To this end, we propose **Alchemist**, a meta-gradient-based framework to select a suitable subset from large-scale text-image data pairs. Our approach automatically learns to assess the influence of each sample by iteratively optimizing the model from a data-centric perspective. Alchemist consists of two key stages: data rating and data pruning. We train a lightweight rater to estimate each sample's influence based on gradient information, enhanced with multi-granularity perception. We then use the Shift-Gsampling strategy to select informative subsets for efficient model training. Alchemist is the first automatic, scalable, meta-gradient-based data selection framework for Text-to-Image model training. Experiments on both synthetic and web-crawled datasets demonstrate that Alchemist consistently improves visual quality and downstream performance. Training on an Alchemist-selected 50% of the data can outperform training on the full dataset.
PDF212December 20, 2025