ChatPaper.aiChatPaper

精准着色:融合感知色彩空间与文本嵌入以优化扩散生成

Color Me Correctly: Bridging Perceptual Color Spaces and Text Embeddings for Improved Diffusion Generation

September 12, 2025
作者: Sung-Lin Tsai, Bo-Lun Huang, Yu Ting Shen, Cheng Yu Yeo, Chiang Tseng, Bo-Kai Ruan, Wen-Sheng Lien, Hong-Han Shuai
cs.AI

摘要

在文本到图像(T2I)生成中,精确的色彩对齐对于时尚、产品可视化和室内设计等应用至关重要,然而当前的扩散模型在处理微妙且复合的色彩术语(如蒂芙尼蓝、柠檬绿、亮粉色)时往往力不从心,生成的图像常与人类意图不符。现有方法依赖于交叉注意力操控、参考图像或微调,但未能系统性地解决模糊的色彩描述问题。为了在提示模糊的情况下精确渲染色彩,我们提出了一种无需训练的框架,通过利用大型语言模型(LLM)来消除色彩相关提示的歧义,并直接在文本嵌入空间中指导色彩混合操作,从而提升色彩保真度。我们的方法首先使用大型语言模型(LLM)解析文本提示中的模糊色彩术语,然后基于这些色彩术语在CIELAB色彩空间中的空间关系优化文本嵌入。与先前方法不同,我们的方法无需额外训练或外部参考图像即可提高色彩准确性。实验结果表明,该框架在不影响图像质量的前提下改善了色彩对齐,弥合了文本语义与视觉生成之间的鸿沟。
English
Accurate color alignment in text-to-image (T2I) generation is critical for applications such as fashion, product visualization, and interior design, yet current diffusion models struggle with nuanced and compound color terms (e.g., Tiffany blue, lime green, hot pink), often producing images that are misaligned with human intent. Existing approaches rely on cross-attention manipulation, reference images, or fine-tuning but fail to systematically resolve ambiguous color descriptions. To precisely render colors under prompt ambiguity, we propose a training-free framework that enhances color fidelity by leveraging a large language model (LLM) to disambiguate color-related prompts and guiding color blending operations directly in the text embedding space. Our method first employs a large language model (LLM) to resolve ambiguous color terms in the text prompt, and then refines the text embeddings based on the spatial relationships of the resulting color terms in the CIELAB color space. Unlike prior methods, our approach improves color accuracy without requiring additional training or external reference images. Experimental results demonstrate that our framework improves color alignment without compromising image quality, bridging the gap between text semantics and visual generation.
PDF112September 15, 2025