ORES：開放詞彙負責視覺合成

摘要

在負責任的視覺合成中，避免合成特定視覺概念是一個重要挑戰。然而，需要避免的負責任視覺合成概念往往因地區、背景和使用情境而有所不同。在這項工作中，我們正式定義了一個新任務，即開放詞彙負責任視覺合成（ORES），在這個任務中，合成模型能夠避免禁止的視覺概念，同時允許用戶輸入任何所需內容。為了應對這個問題，我們提出了一個兩階段干預（TIN）框架。通過引入1）通過大規模語言模型（LLM）進行可學習指導的重寫，以及2）在擴散合成模型上進行提示干預進行合成，它可以有效地合成圖像，避免任何概念，但盡可能地遵循用戶的查詢。為了在ORES上進行評估，我們提供了一個公開可用的數據集、基準模型和基準測試。實驗結果顯示我們的方法在減少圖像生成風險方面的有效性。我們的工作突出了LLM在負責任的視覺合成中的潛力。我們的代碼和數據集是公開可用的。

English

Avoiding synthesizing specific visual concepts is an essential challenge in responsible visual synthesis. However, the visual concept that needs to be avoided for responsible visual synthesis tends to be diverse, depending on the region, context, and usage scenarios. In this work, we formalize a new task, Open-vocabulary Responsible Visual Synthesis (ORES), where the synthesis model is able to avoid forbidden visual concepts while allowing users to input any desired content. To address this problem, we present a Two-stage Intervention (TIN) framework. By introducing 1) rewriting with learnable instruction through a large-scale language model (LLM) and 2) synthesizing with prompt intervention on a diffusion synthesis model, it can effectively synthesize images avoiding any concepts but following the user's query as much as possible. To evaluate on ORES, we provide a publicly available dataset, baseline models, and benchmark. Experimental results demonstrate the effectiveness of our method in reducing risks of image generation. Our work highlights the potential of LLMs in responsible visual synthesis. Our code and dataset is public available.

ORES：開放詞彙負責視覺合成

ORES: Open-vocabulary Responsible Visual Synthesis

摘要

Support