TextCrafter:在複雜視覺場景中精確渲染多重文本
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes
March 30, 2025
作者: Nikai Du, Zhennan Chen, Zhizhou Chen, Shan Gao, Xi Chen, Zhengkai Jiang, Jian Yang, Ying Tai
cs.AI
摘要
本文探討了複雜視覺文本生成(CVTG)任務,該任務專注於在視覺圖像的不同區域內生成分佈複雜的文本內容。在CVTG中,圖像生成模型常常會呈現扭曲、模糊的視覺文本,或遺漏部分視覺文本。為應對這些挑戰,我們提出了TextCrafter,一種新穎的多視覺文本渲染方法。TextCrafter採用漸進策略,將複雜的視覺文本分解為不同的組成部分,同時確保文本內容與其視覺載體之間的穩健對齊。此外,它還引入了令牌聚焦增強機制,以在生成過程中提升視覺文本的顯著性。TextCrafter有效解決了CVTG任務中的關鍵挑戰,如文本混淆、遺漏和模糊等問題。此外,我們還提出了一個新的基準數據集CVTG-2K,專門用於嚴格評估生成模型在CVTG任務上的表現。大量實驗表明,我們的方法超越了現有的最先進技術。
English
This paper explores the task of Complex Visual Text Generation (CVTG), which
centers on generating intricate textual content distributed across diverse
regions within visual images. In CVTG, image generation models often rendering
distorted and blurred visual text or missing some visual text. To tackle these
challenges, we propose TextCrafter, a novel multi-visual text rendering method.
TextCrafter employs a progressive strategy to decompose complex visual text
into distinct components while ensuring robust alignment between textual
content and its visual carrier. Additionally, it incorporates a token focus
enhancement mechanism to amplify the prominence of visual text during the
generation process. TextCrafter effectively addresses key challenges in CVTG
tasks, such as text confusion, omissions, and blurriness. Moreover, we present
a new benchmark dataset, CVTG-2K, tailored to rigorously evaluate the
performance of generative models on CVTG tasks. Extensive experiments
demonstrate that our method surpasses state-of-the-art approaches.Summary
AI-Generated Summary