FlowBender: 자기 교정 조건부 흐름을 위한 피드백 인식 학습

초록

조건부 확산 및 플로우 모델은 자신이 수행해야 할 과제를 정의하는 제약 조건조차 충족하지 못하는 경우가 빈번하다. 예를 들어, 깊이 조건부 모델은 훈련 및 추론 과정에서 제약 조건을 정의하는 순방향 연산자(깊이 예측기)를 사용할 수 있음에도 불구하고, 입력과 재추출된 깊이가 일치하지 않는 이미지를 생성하는 경우가 많다. 기존 접근법은 일반적으로 두 가지 범주로 나뉜다: 조건 신호를 정적 단서로 취급하고 추론 시 정렬 정보를 무시하는 지도 학습 모델과, 수동 조정된 선형 업데이트를 통해 이를 참조하지만 일반적으로 생성된 샘플의 타당성과 조건에 대한 충실도 사이에서 절충하는 유도 기반 방법이다. 우리는 두 패러다임의 근본적인 차이가 모델이 자신의 정렬 오류를 활용하도록 훈련된 적이 없다는 점에 있다고 주장한다. 우리는 FlowBender를 소개한다. 이는 이러한 오류를 일급 입력으로 취급하고, 추론 시 피드백에 조건화된 보정 정책을 학습하도록 네트워크를 훈련하는 폐루프 프레임워크이다. 각 단계에서 비유도 사전 탐색 단계가 깨끗한 신호를 추정하고, 순방향 연산자를 통해 작업별 편차를 계산한 후, 정제 단계가 이 신호를 소비하여 보정된 속도를 생성한다. 우리는 미분 가능 연산자를 위한 경사 기반 공식화와 JPEG 압축과 같은 미분 불가능 설정을 위한 영차 변형을 포함한 여러 FlowBender 변형을 제안한다. 효율적인 샘플링을 위해, 최소한의 추가 계산 비용으로 폐루프 보정을 가능하게 하는 사전 단계 지름길을 도입한다. 이미지 간 변환, 복원 및 3D 메시 텍스처링 전반에 걸쳐 FlowBender는 표준 지도 학습 기준선, 정렬 손실 증강 훈련 및 최신 추론 시 유도보다 일관되게 뛰어난 성능을 보이며, 충실도와 타당성을 서로 절충하는 대신 동시에 향상시킨다. 프로젝트 페이지: https://flow-bender.github.io/

English

Conditional diffusion and flow models routinely fail to satisfy the very constraints that define their task. For instance, a depth-conditioned model often produces images whose re-extracted depth disagrees with the input, even though the forward operator--the depth predictor defining the constraint--is available during both training and inference. Existing approaches generally fall into two categories: supervised models that treat the conditioning signal as a static cue and ignore alignment information at inference, and guidance-based methods that consult it through hand-tuned linear updates, typically trading fidelity to the condition against the plausibility of the generated sample. We argue that the fundamental gap in both paradigms is that the model is never trained to utilize its own alignment error. We introduce FlowBender, a closed-loop framework that treats this error as a first-class input, training the network to learn a correction policy conditioned on inference-time feedback. At each step, an unguided look-ahead pass estimates the clean signal, a task-specific deviation is computed via the forward operator, and a refinement pass consumes this signal to produce a corrected velocity. We propose several variants of FlowBender, including a gradient-based formulation for differentiable operators and a zero-order variant for non-differentiable settings such as JPEG compression. For efficient sampling, we introduce a prior-step shortcut that enables closed-loop correction at a minimal additional computational cost. Across image-to-image translation, restoration, and 3D mesh texturing, FlowBender consistently outperforms standard supervised baselines, alignment-loss-augmented training, and state-of-the-art inference-time guidance, improving fidelity and plausibility simultaneously rather than trading them against each other. Project page: https://flow-bender.github.io/