HiFlow: フローアラインドガイダンスによるトレーニング不要の高解像度画像生成

要旨

テキストから画像（T2I）を生成する拡散/フローモデルは、その柔軟な視覚的創造能力により、最近注目を集めています。しかし、高解像度の画像合成は、高解像度コンテンツの希少性と複雑さから、依然として大きな課題を抱えています。この問題に対処するため、我々はHiFlowを提案します。HiFlowは、事前学習済みのフローモデルの解像度ポテンシャルを引き出すための、トレーニング不要でモデルに依存しないフレームワークです。具体的には、HiFlowは高解像度空間内に仮想参照フローを確立し、低解像度フロー情報の特性を効果的に捉えることで、高解像度生成を以下の3つの主要な側面からガイドします：低周波数一貫性のための初期化アライメント、構造保存のための方向アライメント、および詳細忠実度のための加速アライメント。このフローアライメントガイダンスを活用することで、HiFlowはT2Iモデルの高解像度画像合成の品質を大幅に向上させ、そのパーソナライズされたバリエーションにおいても汎用性を示します。広範な実験により、HiFlowが現在の最先端手法を上回る優れた高解像度画像品質を達成する優位性が検証されました。

English

Text-to-image (T2I) diffusion/flow models have drawn considerable attention recently due to their remarkable ability to deliver flexible visual creations. Still, high-resolution image synthesis presents formidable challenges due to the scarcity and complexity of high-resolution content. To this end, we present HiFlow, a training-free and model-agnostic framework to unlock the resolution potential of pre-trained flow models. Specifically, HiFlow establishes a virtual reference flow within the high-resolution space that effectively captures the characteristics of low-resolution flow information, offering guidance for high-resolution generation through three key aspects: initialization alignment for low-frequency consistency, direction alignment for structure preservation, and acceleration alignment for detail fidelity. By leveraging this flow-aligned guidance, HiFlow substantially elevates the quality of high-resolution image synthesis of T2I models and demonstrates versatility across their personalized variants. Extensive experiments validate HiFlow's superiority in achieving superior high-resolution image quality over current state-of-the-art methods.