Alchemist: 公開テキスト画像データを生成AIの黄金に変える

要旨

事前学習はテキストから画像（T2I）モデルに広範な世界知識を提供しますが、これだけでは高い美的品質と整合性を達成するには不十分な場合が多くあります。そのため、教師ありファインチューニング（SFT）がさらなる洗練に不可欠です。しかし、その効果はファインチューニングデータセットの品質に大きく依存します。既存の公開SFTデータセットはしばしば狭い領域（例：アニメや特定の芸術スタイル）に焦点を当てており、高品質で汎用的なSFTデータセットの作成は依然として大きな課題です。現在のキュレーション手法はしばしばコストがかかり、真に影響力のあるサンプルを特定するのが困難です。この課題は、主要なモデルが大規模で独自の、十分に文書化されていない内部データに依存しているため、公開されている汎用データセットの不足によってさらに複雑化しています。本論文では、事前学習された生成モデルを高影響力のトレーニングサンプルの推定器として活用することで、汎用SFTデータセットを作成する新しい方法論を紹介します。この方法論を適用して、コンパクト（3,350サンプル）でありながら非常に効果的なSFTデータセットであるAlchemistを構築し、公開します。実験により、Alchemistが5つの公開T2Iモデルの生成品質を大幅に向上させながら、多様性とスタイルを維持することが実証されています。さらに、ファインチューニングされたモデルの重みを一般公開します。

English

Pre-training equips text-to-image (T2I) models with broad world knowledge, but this alone is often insufficient to achieve high aesthetic quality and alignment. Consequently, supervised fine-tuning (SFT) is crucial for further refinement. However, its effectiveness highly depends on the quality of the fine-tuning dataset. Existing public SFT datasets frequently target narrow domains (e.g., anime or specific art styles), and the creation of high-quality, general-purpose SFT datasets remains a significant challenge. Current curation methods are often costly and struggle to identify truly impactful samples. This challenge is further complicated by the scarcity of public general-purpose datasets, as leading models often rely on large, proprietary, and poorly documented internal data, hindering broader research progress. This paper introduces a novel methodology for creating general-purpose SFT datasets by leveraging a pre-trained generative model as an estimator of high-impact training samples. We apply this methodology to construct and release Alchemist, a compact (3,350 samples) yet highly effective SFT dataset. Experiments demonstrate that Alchemist substantially improves the generative quality of five public T2I models while preserving diversity and style. Additionally, we release the fine-tuned models' weights to the public.

Alchemist: 公開テキスト画像データを生成AIの黄金に変える

Alchemist: Turning Public Text-to-Image Data into Generative Gold

要旨

Support