ORES：开放词汇负责任视觉合成

摘要

在负责任的视觉合成中，避免合成特定视觉概念是一项重要挑战。然而，需要在负责任的视觉合成中避免的视觉概念往往是多样的，取决于地区、背景和使用场景。在这项工作中，我们正式提出了一个新任务，即开放词汇负责任视觉合成（ORES），合成模型能够避免禁止的视觉概念，同时允许用户输入任何所需内容。为了解决这个问题，我们提出了一个两阶段干预（TIN）框架。通过引入1）通过大规模语言模型（LLM）进行可学习指导的重写和2）在扩散合成模型上进行提示干预的合成，它可以有效地合成图像，避免任何概念，但尽可能地遵循用户的查询。为了在ORES上进行评估，我们提供了一个公开可用的数据集、基准模型和基准测试。实验结果表明我们的方法在减少图像生成风险方面的有效性。我们的工作突出了LLM在负责任的视觉合成中的潜力。我们的代码和数据集是公开可用的。

English

Avoiding synthesizing specific visual concepts is an essential challenge in responsible visual synthesis. However, the visual concept that needs to be avoided for responsible visual synthesis tends to be diverse, depending on the region, context, and usage scenarios. In this work, we formalize a new task, Open-vocabulary Responsible Visual Synthesis (ORES), where the synthesis model is able to avoid forbidden visual concepts while allowing users to input any desired content. To address this problem, we present a Two-stage Intervention (TIN) framework. By introducing 1) rewriting with learnable instruction through a large-scale language model (LLM) and 2) synthesizing with prompt intervention on a diffusion synthesis model, it can effectively synthesize images avoiding any concepts but following the user's query as much as possible. To evaluate on ORES, we provide a publicly available dataset, baseline models, and benchmark. Experimental results demonstrate the effectiveness of our method in reducing risks of image generation. Our work highlights the potential of LLMs in responsible visual synthesis. Our code and dataset is public available.

ORES：开放词汇负责任视觉合成

ORES: Open-vocabulary Responsible Visual Synthesis

摘要

Support