ChatPaper.aiChatPaper

最优控制邂逅流匹配:通往多主体保真度的理论路径

Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity

October 2, 2025
作者: Eric Tillmann Bill, Enis Simsar, Thomas Hofmann
cs.AI

摘要

文本到图像(T2I)模型在处理单一实体提示时表现出色,但在面对多主体描述时却常显乏力,往往出现属性泄露、身份混淆以及主体遗漏等问题。我们首次提出了一套理论框架,其核心在于确立了一个可优化的原则性目标,旨在引导采样动态向多主体保真度靠拢。通过将流匹配(FM)置于随机最优控制(SOC)视角下,我们将主体解耦问题转化为对已训练FM采样器的控制任务。这一思路催生了两种架构无关的算法:(一)一种无需训练的测试时控制器,它通过单次更新扰动基础速度场;(二)伴随匹配,一种轻量级微调规则,它通过回归控制网络至反向伴随信号,同时保留基础模型的能力。该框架不仅统一了先前的注意力启发式方法,还通过流-扩散对应关系扩展至扩散模型,并首次提供了专为多主体保真度设计的微调路径。实证表明,在Stable Diffusion 3.5、FLUX及Stable Diffusion XL上,这两种算法均能持续提升多主体对齐效果,同时保持基础模型风格。测试时控制器能在普通GPU上高效运行,而基于有限提示训练的微调控制器还能泛化至未见过的提示。我们进一步展示了FOCUS(面向无纠缠主体的流最优控制),它在各模型中实现了最先进的多主体保真度。
English
Text-to-image (T2I) models excel on single-entity prompts but struggle with multi-subject descriptions, often showing attribute leakage, identity entanglement, and subject omissions. We introduce the first theoretical framework with a principled, optimizable objective for steering sampling dynamics toward multi-subject fidelity. Viewing flow matching (FM) through stochastic optimal control (SOC), we formulate subject disentanglement as control over a trained FM sampler. This yields two architecture-agnostic algorithms: (i) a training-free test-time controller that perturbs the base velocity with a single-pass update, and (ii) Adjoint Matching, a lightweight fine-tuning rule that regresses a control network to a backward adjoint signal while preserving base-model capabilities. The same formulation unifies prior attention heuristics, extends to diffusion models via a flow-diffusion correspondence, and provides the first fine-tuning route explicitly designed for multi-subject fidelity. Empirically, on Stable Diffusion 3.5, FLUX, and Stable Diffusion XL, both algorithms consistently improve multi-subject alignment while maintaining base-model style. Test-time control runs efficiently on commodity GPUs, and fine-tuned controllers trained on limited prompts generalize to unseen ones. We further highlight FOCUS (Flow Optimal Control for Unentangled Subjects), which achieves state-of-the-art multi-subject fidelity across models.
PDF52October 3, 2025