最優控制遇上流匹配：通往多主體保真度的原則性路徑

摘要

文本到圖像（T2I）模型在單一實體提示上表現出色，但在多主體描述上卻常常力不從心，經常出現屬性洩漏、身份糾纏和主體遺漏等問題。我們首次引入了一個理論框架，該框架具有可優化的原則性目標，用於引導採樣動態朝向多主體保真度。通過隨機最優控制（SOC）的視角來審視流匹配（FM），我們將主體解纏結表述為對已訓練FM採樣器的控制。這產生了兩種與架構無關的算法：（i）一種無需訓練的測試時控制器，通過單次更新擾動基礎速度；（ii）伴隨匹配，一種輕量級的微調規則，將控制網絡迴歸到反向伴隨信號，同時保留基礎模型的能力。該公式統一了先前的注意力啟發式方法，通過流-擴散對應關係擴展到擴散模型，並提供了首個專為多主體保真度設計的微調路徑。實證上，在Stable Diffusion 3.5、FLUX和Stable Diffusion XL上，這兩種算法均能持續提升多主體對齊，同時保持基礎模型的風格。測試時控制能在商用GPU上高效運行，且基於有限提示訓練的微調控制器能泛化到未見的提示。我們進一步強調了FOCUS（用於無糾結主體的流最優控制），它在各模型中實現了最先進的多主體保真度。

English

Text-to-image (T2I) models excel on single-entity prompts but struggle with multi-subject descriptions, often showing attribute leakage, identity entanglement, and subject omissions. We introduce the first theoretical framework with a principled, optimizable objective for steering sampling dynamics toward multi-subject fidelity. Viewing flow matching (FM) through stochastic optimal control (SOC), we formulate subject disentanglement as control over a trained FM sampler. This yields two architecture-agnostic algorithms: (i) a training-free test-time controller that perturbs the base velocity with a single-pass update, and (ii) Adjoint Matching, a lightweight fine-tuning rule that regresses a control network to a backward adjoint signal while preserving base-model capabilities. The same formulation unifies prior attention heuristics, extends to diffusion models via a flow-diffusion correspondence, and provides the first fine-tuning route explicitly designed for multi-subject fidelity. Empirically, on Stable Diffusion 3.5, FLUX, and Stable Diffusion XL, both algorithms consistently improve multi-subject alignment while maintaining base-model style. Test-time control runs efficiently on commodity GPUs, and fine-tuned controllers trained on limited prompts generalize to unseen ones. We further highlight FOCUS (Flow Optimal Control for Unentangled Subjects), which achieves state-of-the-art multi-subject fidelity across models.

最優控制遇上流匹配：通往多主體保真度的原則性路徑

Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity

摘要

Support