ChatPaper.aiChatPaper

RAISE:面向免训练文图对齐的需求自适应进化优化

RAISE: Requirement-Adaptive Evolutionary Refinement for Training-Free Text-to-Image Alignment

February 28, 2026
作者: Liyao Jiang, Ruichen Chen, Chao Gao, Di Niu
cs.AI

摘要

当前文生图扩散模型虽能生成逼真图像,但在处理包含多对象、复杂关系和细粒度属性的提示词时,仍难以实现精准的图文对齐。现有无需训练的动态推理缩放方法依赖固定迭代次数,无法适配提示词复杂度;而反思调优模型需精心构建反思数据集并对扩散模型与视觉语言模型进行大量联合微调,常因过度拟合反思路径数据而缺乏跨模型迁移能力。我们提出RAISE(需求自适应自进化框架),一种无需训练、需求驱动的自适应文生图进化框架。RAISE将图像生成定义为需求导向的自适应缩放过程,通过提示词重写、噪声重采样和指令编辑等多样化优化操作,在推理时对候选图像群体进行进化。每一代结果都会经过结构化需求清单的验证,使系统能动态识别未达标项并针对性分配计算资源,从而实现计算成本与语义查询复杂度的自适应匹配。在GenEval和DrawBench基准测试中,RAISE以0.94的GenEval综合对齐度达到最优性能,同时比现有缩放方法和反思调优基线减少30-40%的生成样本量和80%的视觉语言模型调用量,展现出高效、可泛化且模型无关的多轮自优化能力。代码已开源:https://github.com/LiyaoJiang1998/RAISE。
English
Recent text-to-image (T2I) diffusion models achieve remarkable realism, yet faithful prompt-image alignment remains challenging, particularly for complex prompts with multiple objects, relations, and fine-grained attributes. Existing training-free inference-time scaling methods rely on fixed iteration budgets that cannot adapt to prompt difficulty, while reflection-tuned models require carefully curated reflection datasets and extensive joint fine-tuning of diffusion and vision-language models, often overfitting to reflection paths data and lacking transferability across models. We introduce RAISE (Requirement-Adaptive Self-Improving Evolution), a training-free, requirement-driven evolutionary framework for adaptive T2I generation. RAISE formulates image generation as a requirement-driven adaptive scaling process, evolving a population of candidates at inference time through a diverse set of refinement actions-including prompt rewriting, noise resampling, and instructional editing. Each generation is verified against a structured checklist of requirements, enabling the system to dynamically identify unsatisfied items and allocate further computation only where needed. This achieves adaptive test-time scaling that aligns computational effort with semantic query complexity. On GenEval and DrawBench, RAISE attains state-of-the-art alignment (0.94 overall GenEval) while incurring fewer generated samples (reduced by 30-40%) and VLM calls (reduced by 80%) than prior scaling and reflection-tuned baselines, demonstrating efficient, generalizable, and model-agnostic multi-round self-improvement. Code is available at https://github.com/LiyaoJiang1998/RAISE.
PDF31March 4, 2026