Parrot: 텍스트-이미지 생성을 위한 파레토 최적 다중 보상 강화 학습 프레임워크

초록

최근 연구들은 품질 보상을 활용한 강화 학습(RL)이 텍스트-이미지(T2I) 생성에서 생성된 이미지의 품질을 향상시킬 수 있음을 보여주었습니다. 그러나 여러 보상을 단순히 통합하는 경우 특정 지표에서는 과도한 최적화가 발생하고 다른 지표에서는 성능 저하가 발생할 수 있으며, 최적의 가중치를 수동으로 찾는 것은 어려운 과제입니다. T2I 생성을 위한 RL에서 여러 보상을 공동으로 최적화하는 효과적인 전략은 매우 요구됩니다. 본 논문은 T2I 생성을 위한 새로운 다중 보상 RL 프레임워크인 Parrot을 소개합니다. Parrot은 배치 단위 파레토 최적 선택을 통해 T2I 생성의 RL 최적화 과정에서 다양한 보상 간의 최적 균형을 자동으로 식별합니다. 또한, Parrot은 T2I 모델과 프롬프트 확장 네트워크를 공동으로 최적화하는 접근 방식을 채택하여, 품질을 고려한 텍스트 프롬프트 생성을 촉진함으로써 최종 이미지 품질을 더욱 향상시킵니다. 프롬프트 확장으로 인해 원본 사용자 프롬프트가 잊히는 치명적인 문제를 방지하기 위해, 추론 시점에 원본 프롬프트 중심 가이던스를 도입하여 생성된 이미지가 사용자 입력에 충실하도록 보장합니다. 광범위한 실험과 사용자 연구를 통해 Parrot이 미학, 인간 선호도, 이미지 감정, 텍스트-이미지 정렬 등 다양한 품질 기준에서 여러 베이스라인 방법들을 능가함을 입증하였습니다.

English

Recent works demonstrate that using reinforcement learning (RL) with quality rewards can enhance the quality of generated images in text-to-image (T2I) generation. However, a simple aggregation of multiple rewards may cause over-optimization in certain metrics and degradation in others, and it is challenging to manually find the optimal weights. An effective strategy to jointly optimize multiple rewards in RL for T2I generation is highly desirable. This paper introduces Parrot, a novel multi-reward RL framework for T2I generation. Through the use of the batch-wise Pareto optimal selection, Parrot automatically identifies the optimal trade-off among different rewards during the RL optimization of the T2I generation. Additionally, Parrot employs a joint optimization approach for the T2I model and the prompt expansion network, facilitating the generation of quality-aware text prompts, thus further enhancing the final image quality. To counteract the potential catastrophic forgetting of the original user prompt due to prompt expansion, we introduce original prompt centered guidance at inference time, ensuring that the generated image remains faithful to the user input. Extensive experiments and a user study demonstrate that Parrot outperforms several baseline methods across various quality criteria, including aesthetics, human preference, image sentiment, and text-image alignment.

Parrot: 텍스트-이미지 생성을 위한 파레토 최적 다중 보상 강화 학습 프레임워크

Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

초록

Support