OneIG-Bench: 画像生成のための全次元的でニュアンス豊かな評価

要旨

テキストから画像（T2I）生成モデルは、テキストプロンプトに沿った高品質な画像を生成する能力で注目を集めています。しかし、T2Iモデルの急速な進展に伴い、初期のベンチマークには推論能力、テキストレンダリング、スタイル化など、包括的な評価が欠けていることが明らかになりました。特に、最新の最先端モデルは、豊富な知識モデリング能力を備えており、強力な推論能力を必要とする画像生成問題において有望な結果を示していますが、既存の評価システムはこの最先端の課題に十分に対応していません。これらのギャップを体系的に解決するため、我々はOneIG-Benchを導入しました。これは、プロンプトと画像の整合性、テキストレンダリングの精度、推論によるコンテンツ生成、スタイル化、多様性など、複数の次元にわたるT2Iモデルの詳細な評価を行うために綿密に設計された包括的なベンチマークフレームワークです。このベンチマークは、評価を構造化することで、モデルのパフォーマンスを深く分析し、研究者や実務者が画像生成の全プロセスにおける強みとボトルネックを特定するのに役立ちます。具体的には、OneIG-Benchは、ユーザーが特定の評価サブセットに焦点を当てて柔軟に評価を行うことを可能にします。全てのプロンプトに対して画像を生成する代わりに、選択した次元に関連するプロンプトのみに対して画像を生成し、それに応じた評価を完了することができます。我々のコードベースとデータセットは、T2I研究コミュニティ内での再現可能な評価研究とクロスモデル比較を促進するために公開されています。

English

Text-to-image (T2I) models have garnered significant attention for generating high-quality images aligned with text prompts. However, rapid T2I model advancements reveal limitations in early benchmarks, lacking comprehensive evaluations, for example, the evaluation on reasoning, text rendering and style. Notably, recent state-of-the-art models, with their rich knowledge modeling capabilities, show promising results on the image generation problems requiring strong reasoning ability, yet existing evaluation systems have not adequately addressed this frontier. To systematically address these gaps, we introduce OneIG-Bench, a meticulously designed comprehensive benchmark framework for fine-grained evaluation of T2I models across multiple dimensions, including prompt-image alignment, text rendering precision, reasoning-generated content, stylization, and diversity. By structuring the evaluation, this benchmark enables in-depth analysis of model performance, helping researchers and practitioners pinpoint strengths and bottlenecks in the full pipeline of image generation. Specifically, OneIG-Bench enables flexible evaluation by allowing users to focus on a particular evaluation subset. Instead of generating images for the entire set of prompts, users can generate images only for the prompts associated with the selected dimension and complete the corresponding evaluation accordingly. Our codebase and dataset are now publicly available to facilitate reproducible evaluation studies and cross-model comparisons within the T2I research community.

OneIG-Bench: 画像生成のための全次元的でニュアンス豊かな評価

OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation

要旨

Support