自己敵対的ワンステップ生成：条件シフトによるアプローチ

要旨

効率的なテキストから画像への合成を目指す動きは、一段階サンプリングへと分野を進展させてきたが、既存手法は依然として忠実度、推論速度、学習効率の三択のトレードオフに直面している。外部識別器に依存する手法は一段階性能を向上させ得るが、学習の不安定性、高いGPUメモリ負荷、収束の遅さを引き起こし、スケーリングやパラメータ効率的なチューニングを複雑にする。一方、回帰ベースの蒸留および一貫性目的関数は最適化が容易だが、単一段階に制約されると微細な詳細を失いがちである。本論文では、理論的洞察に基づくAPEXを提案する：条件シフトを通じて、フローモデルから内生的に対抗的補正信号を抽出可能である。変換を利用することで生成されるシフト条件分岐の速度場は、モデルの現在の生成分布の独立した推定量として機能し、勾配消失を引き起こすサンプル依存の識別器項を置換する、証明可能にGAN整合性を持つ勾配を生成する。この識別器不要の設計はアーキテクチャを保持するため、APEXは全パラメータチューニングとLoRAベースのチューニングの両方に対応するプラグアンドプレイフレームワークとなる。実証的に、当方の0.6Bパラメータモデルは、一段階品質においてFLUX-Schnell 12B（パラメータ数20倍）を凌駕する。Qwen-Image 20BへのLoRAチューニングでは、APEXは6時間でNFE=1においてGenEvalスコア0.89を達成し、元の50ステップの教師モデル（0.87）を上回り、15.33倍の推論速度向上を実現する。コードはhttps://github.com/LINs-lab/APEX で公開されている。

English

The push for efficient text to image synthesis has moved the field toward one step sampling, yet existing methods still face a three way tradeoff among fidelity, inference speed, and training efficiency. Approaches that rely on external discriminators can sharpen one step performance, but they often introduce training instability, high GPU memory overhead, and slow convergence, which complicates scaling and parameter efficient tuning. In contrast, regression based distillation and consistency objectives are easier to optimize, but they typically lose fine details when constrained to a single step. We present APEX, built on a key theoretical insight: adversarial correction signals can be extracted endogenously from a flow model through condition shifting. Using a transformation creates a shifted condition branch whose velocity field serves as an independent estimator of the model's current generation distribution, yielding a gradient that is provably GAN aligned, replacing the sample dependent discriminator terms that cause gradient vanishing. This discriminator free design is architecture preserving, making APEX a plug and play framework compatible with both full parameter and LoRA based tuning. Empirically, our 0.6B model surpasses FLUX-Schnell 12B (20times more parameters) in one step quality. With LoRA tuning on Qwen-Image 20B, APEX reaches a GenEval score of 0.89 at NFE=1 in 6 hours, surpassing the original 50-step teacher (0.87) and providing a 15.33times inference speedup. Code is available https://github.com/LINs-lab/APEX.

自己敵対的ワンステップ生成：条件シフトによるアプローチ

Self-Adversarial One Step Generation via Condition Shifting

要旨

Support