PairUni:面向统一多模态语言模型的成对训练框架
PairUni: Pairwise Training for Unified Multimodal Language Models
October 29, 2025
作者: Jiani Zheng, Zhiyang Teng, Xiangtai Li, Anran Wang, Yu Tian, Kunpeng Qiu, Ye Tian, Haochen Wang, Zhuochen Wang
cs.AI
摘要
统一视觉语言模型(UVLM)需在单一架构中同时完成理解与生成任务,但这两类任务依赖异构数据和监督信号,导致在强化学习(RL)过程中难以实现平衡。我们提出PairUni框架,通过将数据重组为理解-生成(UG)配对并相应调整优化策略来解决该问题。我们首先利用GPT-4o对单任务数据进行增强:为理解样本生成描述文本,为生成样本生成问答对,从而基于同一实例构建对齐配对。此外,针对每个生成样本,我们检索语义相关的理解样本构成检索配对,建立不同数据点间的关联。这种配对结构显式呈现跨任务语义关联,并支持一致性策略学习。基于此,我们提出Pair-GPRO——基于群体相对策略优化的配对感知变体,通过为每个配对分配相似度分数来调节优势函数,从而强化对齐良好样本的学习并减少任务干扰。我们精心构建了包含1.6万组UG配对的高质量数据集PairUG用于RL微调,并在强基准模型Janus-Pro UVLM上评估PairUni。实验表明该方法能在多种UVLM上实现均衡性能提升,显著优于现有UVLM强化学习基线。代码地址:https://github.com/Haochen-Wang409/PairUni{github.com/Haochen-Wang409/PairUni}
English
Unified vision-language models (UVLMs) must perform both understanding and
generation within a single architecture, but these tasks rely on heterogeneous
data and supervision, making it difficult to balance them during reinforcement
learning (RL). We propose PairUni, a unified framework that reorganizes data
into understanding-generation (UG) pairs and aligns optimization accordingly.
We first use GPT-o3 to augment single-task data, generating captions for
understanding samples and question-answer (QA) pairs for generation samples,
forming aligned pairs from the same instance. Additionally, for each generation
sample, we retrieve a semantically related understanding example to form a
retrieved pair, linking different but related data points. These paired
structures expose cross-task semantic correspondences and support consistent
policy learning. To leverage this structure, we present Pair-GPRO, a pair-aware
variant based on Group Relative Policy Optimization. It assigns a similarity
score to each pair to modulate the advantage, strengthening learning from
well-aligned examples and reducing task interference. We curate a high-quality
dataset of 16K UG pairs named PairUG for RL fine-tuning and evaluate PairUni on
the powerful Janus-Pro UVLMs. Our approach achieves balanced improvements on
various UVLMs, outperforming strong UVLM RL baselines. Code:
https://github.com/Haochen-Wang409/PairUni{github.com/Haochen-Wang409/PairUni}