OneFlow: 편집 흐름을 통한 동시적 혼합 모드 및 인터리브 생성

초록

본 논문에서는 가변 길이 및 동시 다중 모드 생성을 가능하게 하는 최초의 비자기회귀적 다중 모델인 OneFlow를 소개한다. 텍스트와 이미지 생성 간의 엄격한 인과적 순서를 강제하는 자기회귀 모델과 달리, OneFlow는 이산적 텍스트 토큰을 위한 삽입 기반 Edit Flow와 이미지 잠재 변수를 위한 Flow Matching을 결합한다. OneFlow는 문법보다 내용을 우선시하는 계층적 샘플링을 통해 동시 텍스트-이미지 합성을 가능하게 한다. 1B에서 8B까지의 모델 크기에 걸친 통제된 실험을 통해, OneFlow가 최대 50% 적은 학습 FLOPs를 사용하면서도 생성 및 이해 작업에서 자기회귀적 기준 모델을 능가함을 입증한다. OneFlow는 자기회귀적 및 확산 기반 접근법을 모두 능가하면서 동시 생성, 반복적 정제, 자연스러운 추론과 같은 새로운 기능을 제공한다.

English

We present OneFlow, the first non-autoregressive multimodal model that enables variable-length and concurrent mixed-modal generation. Unlike autoregressive models that enforce rigid causal ordering between text and image generation, OneFlow combines an insertion-based Edit Flow for discrete text tokens with Flow Matching for image latents. OneFlow enables concurrent text-image synthesis with hierarchical sampling that prioritizes content over grammar. Through controlled experiments across model sizes from 1B to 8B, we demonstrate that OneFlow outperforms autoregressive baselines on both generation and understanding tasks while using up to 50% fewer training FLOPs. OneFlow surpasses both autoregressive and diffusion-based approaches while unlocking new capabilities for concurrent generation, iterative refinement, and natural reasoning-like generation.

OneFlow: 편집 흐름을 통한 동시적 혼합 모드 및 인터리브 생성

OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows

초록

Support