ChatPaper.aiChatPaper

USO:通过解耦与奖励学习实现统一风格与主题驱动的生成

USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning

August 26, 2025
作者: Shaojin Wu, Mengqi Huang, Yufeng Cheng, Wenxu Wu, Jiahe Tian, Yiming Luo, Fei Ding, Qian He
cs.AI

摘要

现有文献通常将风格驱动和主题驱动的生成视为两个独立的任务:前者侧重于风格相似性,而后者则强调主题一致性,导致两者之间形成明显的对立。我们认为,这两个目标可以在单一框架下统一起来,因为它们本质上都涉及内容与风格的解耦与重组,这是风格驱动研究中的一个长期主题。为此,我们提出了USO,即统一风格-主题优化定制模型。首先,我们构建了一个大规模的三元组数据集,包含内容图像、风格图像及其对应的风格化内容图像。其次,我们引入了一种解耦学习方案,通过风格对齐训练和内容-风格解耦训练这两个互补目标,同时实现风格特征的对齐以及内容与风格的分离。第三,我们整合了一种称为SRL的风格奖励学习范式,以进一步提升模型的性能。最后,我们发布了USO-Bench,这是首个在多指标下联合评估风格相似性和主题保真度的基准。大量实验表明,USO在开源模型中,无论是在主题一致性还是风格相似性方面,均达到了最先进的性能。代码与模型:https://github.com/bytedance/USO
English
Existing literature typically treats style-driven and subject-driven generation as two disjoint tasks: the former prioritizes stylistic similarity, whereas the latter insists on subject consistency, resulting in an apparent antagonism. We argue that both objectives can be unified under a single framework because they ultimately concern the disentanglement and re-composition of content and style, a long-standing theme in style-driven research. To this end, we present USO, a Unified Style-Subject Optimized customization model. First, we construct a large-scale triplet dataset consisting of content images, style images, and their corresponding stylized content images. Second, we introduce a disentangled learning scheme that simultaneously aligns style features and disentangles content from style through two complementary objectives, style-alignment training and content-style disentanglement training. Third, we incorporate a style reward-learning paradigm denoted as SRL to further enhance the model's performance. Finally, we release USO-Bench, the first benchmark that jointly evaluates style similarity and subject fidelity across multiple metrics. Extensive experiments demonstrate that USO achieves state-of-the-art performance among open-source models along both dimensions of subject consistency and style similarity. Code and model: https://github.com/bytedance/USO
PDF442August 29, 2025