OneFlow: 編集フローを用いた並列マルチモーダルおよびインターリーブ生成

要旨

我々はOneFlowを提案する。これは、可変長かつ並列的なマルチモーダル生成を可能にする初の非自己回帰型マルチモーダルモデルである。テキストと画像生成の間に厳密な因果順序を強制する自己回帰モデルとは異なり、OneFlowは離散的なテキストトークンのための挿入ベースのEdit Flowと画像潜在変数のためのFlow Matchingを組み合わせている。OneFlowは、文法よりも内容を優先する階層的サンプリングにより、テキストと画像の並列合成を実現する。1Bから8Bまでのモデルサイズにわたる制御実験を通じて、OneFlowが生成タスクと理解タスクの両方において自己回帰ベースラインを上回り、最大50%少ない訓練FLOPsを使用することを示す。OneFlowは自己回帰型と拡散ベースのアプローチの両方を凌駕し、並列生成、反復的洗練、自然な推論のような生成といった新たな能力を解き放つ。

English

We present OneFlow, the first non-autoregressive multimodal model that enables variable-length and concurrent mixed-modal generation. Unlike autoregressive models that enforce rigid causal ordering between text and image generation, OneFlow combines an insertion-based Edit Flow for discrete text tokens with Flow Matching for image latents. OneFlow enables concurrent text-image synthesis with hierarchical sampling that prioritizes content over grammar. Through controlled experiments across model sizes from 1B to 8B, we demonstrate that OneFlow outperforms autoregressive baselines on both generation and understanding tasks while using up to 50% fewer training FLOPs. OneFlow surpasses both autoregressive and diffusion-based approaches while unlocking new capabilities for concurrent generation, iterative refinement, and natural reasoning-like generation.

OneFlow: 編集フローを用いた並列マルチモーダルおよびインターリーブ生成

OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows

要旨

Support