ChatPaper.aiChatPaper

双重视角融合:基于统一离散流匹配的多模态推理与生成

Best of Both Worlds: Multimodal Reasoning and Generation via Unified Discrete Flow Matching

February 12, 2026
作者: Onkar Susladkar, Tushar Prakash, Gayatri Deshmukh, Kiet A. Nguyen, Jiaxun Zhang, Adheesh Juvekar, Tianshu Bao, Lin Chai, Sparsh Mittal, Inderjit S Dhillon, Ismini Lourentzou
cs.AI

摘要

我们提出UniDFlow——一个面向多模态理解、生成与编辑的统一离散流匹配框架。该框架通过任务特定的低秩适配器实现理解与生成的解耦,避免目标干扰与表征纠缠;同时采用新颖的基于参考的多模态偏好对齐方法,在相同条件下优化相对输出效果,无需大规模重训练即可提升忠实度与可控性。UniDFlow在八大基准测试中实现最先进性能,并在未经过显式任务专门训练的情况下,对修复、上下文图像生成、参考式编辑及组合生成等任务展现出强大的零样本泛化能力。
English
We propose UniDFlow, a unified discrete flow-matching framework for multimodal understanding, generation, and editing. It decouples understanding and generation via task-specific low-rank adapters, avoiding objective interference and representation entanglement, while a novel reference-based multimodal preference alignment optimizes relative outcomes under identical conditioning, improving faithfulness and controllability without large-scale retraining. UniDFlpw achieves SOTA performance across eight benchmarks and exhibits strong zero-shot generalization to tasks including inpainting, in-context image generation, reference-based editing, and compositional generation, despite no explicit task-specific training.
PDF22February 17, 2026