FlowBender：面向自纠错条件流的反馈感知训练

摘要

条件扩散模型和流模型常常无法满足其任务所定义的约束条件。例如，一个深度条件模型生成的图像，其重新提取的深度往往与输入不一致——尽管定义约束条件的前向算子（即深度预测器）在训练和推理阶段均可获取。现有方法通常分为两类：一类将条件信号视为静态线索的监督模型（在推理时忽略对齐信息），另一类通过手动调参的线性更新查询条件信号的引导类方法（通常以牺牲生成样本的合理性为代价来换取条件保真度）。我们认为两种范式的根本缺陷在于：模型从未被训练利用自身的对齐误差。为此，我们提出闭环框架FlowBender，将此类误差作为一等输入，训练网络学习依赖推理时反馈的修正策略。在每一步中，无引导的前瞻性传递首先估计干净信号，通过前向算子计算任务特定偏差，随后修正传递利用此信号生成校正后的速度场。我们提出FlowBender的多种变体，包括面向可微算子的梯度公式，以及面向不可微场景（如JPEG压缩）的零阶变体。为提升采样效率，我们引入前步快捷方式，使闭环校正仅需极低额外计算成本。在图像到图像翻译、图像复原以及3D网格纹理任务中，FlowBender一致优于标准监督基线、对齐损失增强训练及最先进的推理时引导方法，同时提升保真度与合理性，而非在两者间进行权衡。项目页面：https://flow-bender.github.io/

English

Conditional diffusion and flow models routinely fail to satisfy the very constraints that define their task. For instance, a depth-conditioned model often produces images whose re-extracted depth disagrees with the input, even though the forward operator--the depth predictor defining the constraint--is available during both training and inference. Existing approaches generally fall into two categories: supervised models that treat the conditioning signal as a static cue and ignore alignment information at inference, and guidance-based methods that consult it through hand-tuned linear updates, typically trading fidelity to the condition against the plausibility of the generated sample. We argue that the fundamental gap in both paradigms is that the model is never trained to utilize its own alignment error. We introduce FlowBender, a closed-loop framework that treats this error as a first-class input, training the network to learn a correction policy conditioned on inference-time feedback. At each step, an unguided look-ahead pass estimates the clean signal, a task-specific deviation is computed via the forward operator, and a refinement pass consumes this signal to produce a corrected velocity. We propose several variants of FlowBender, including a gradient-based formulation for differentiable operators and a zero-order variant for non-differentiable settings such as JPEG compression. For efficient sampling, we introduce a prior-step shortcut that enables closed-loop correction at a minimal additional computational cost. Across image-to-image translation, restoration, and 3D mesh texturing, FlowBender consistently outperforms standard supervised baselines, alignment-loss-augmented training, and state-of-the-art inference-time guidance, improving fidelity and plausibility simultaneously rather than trading them against each other. Project page: https://flow-bender.github.io/