ORES: 개방형 어휘를 지원하는 책임감 있는 시각적 합성

초록

특정 시각적 개념의 합성을 피하는 것은 책임 있는 시각적 합성에서 필수적인 과제입니다. 그러나 책임 있는 시각적 합성을 위해 피해야 할 시각적 개념은 지역, 맥락, 사용 시나리오에 따라 다양하게 나타납니다. 본 연구에서는 새로운 과제인 Open-vocabulary Responsible Visual Synthesis (ORES)를 정식화하였습니다. 이 과제에서는 사용자가 원하는 내용을 입력할 수 있으면서도 금지된 시각적 개념을 피할 수 있는 합성 모델을 다룹니다. 이 문제를 해결하기 위해, 우리는 Two-stage Intervention (TIN) 프레임워크를 제안합니다. 이 프레임워크는 1) 대규모 언어 모델(LLM)을 통한 학습 가능한 지시문 재작성과 2) 확산 합성 모델에 대한 프롬프트 개입을 통해 합성을 수행함으로써, 사용자의 쿼리를 최대한 따르면서도 어떠한 개념도 피하는 이미지를 효과적으로 합성할 수 있습니다. ORES를 평가하기 위해, 우리는 공개적으로 이용 가능한 데이터셋, 베이스라인 모델, 벤치마크를 제공합니다. 실험 결과는 우리의 방법이 이미지 생성의 위험을 줄이는 데 효과적임을 보여줍니다. 본 연구는 LLM이 책임 있는 시각적 합성에서 갖는 잠재력을 강조합니다. 우리의 코드와 데이터셋은 공개되어 있습니다.

English

Avoiding synthesizing specific visual concepts is an essential challenge in responsible visual synthesis. However, the visual concept that needs to be avoided for responsible visual synthesis tends to be diverse, depending on the region, context, and usage scenarios. In this work, we formalize a new task, Open-vocabulary Responsible Visual Synthesis (ORES), where the synthesis model is able to avoid forbidden visual concepts while allowing users to input any desired content. To address this problem, we present a Two-stage Intervention (TIN) framework. By introducing 1) rewriting with learnable instruction through a large-scale language model (LLM) and 2) synthesizing with prompt intervention on a diffusion synthesis model, it can effectively synthesize images avoiding any concepts but following the user's query as much as possible. To evaluate on ORES, we provide a publicly available dataset, baseline models, and benchmark. Experimental results demonstrate the effectiveness of our method in reducing risks of image generation. Our work highlights the potential of LLMs in responsible visual synthesis. Our code and dataset is public available.

ORES: 개방형 어휘를 지원하는 책임감 있는 시각적 합성

ORES: Open-vocabulary Responsible Visual Synthesis

초록

Support