RAISE：面向免訓練文本到圖像對齊的需求自適應進化優化

摘要

当前文生图（T2I）扩散模型虽能生成高度逼真的图像，但在处理包含多对象、复杂关系和细粒度属性的提示词时，仍难以实现精准的图文对齐。现有无需训练的策略采用固定迭代次数的推理时缩放方法，无法根据提示词难度动态调整；而基于反思调优的模型不仅需要精心构建的反思数据集，还需对扩散模型与视觉语言模型进行大量联合微调，容易过度拟合反思路径数据且缺乏跨模型迁移能力。我们提出RAISE（需求自适应自我进化框架），一种无需训练、需求驱动的自适应文生图进化框架。该框架将图像生成定义为需求导向的自适应缩放过程：在推理时通过提示词重写、噪声重采样和指令编辑等多样化优化操作，对候选图像群体进行迭代进化。每一代结果都会通过结构化需求清单进行验证，使系统能动态识别未满足项并仅针对需改进环节分配计算资源，从而实现计算成本与语义查询复杂度相匹配的自适应测试时缩放。在GenEval和DrawBench基准测试中，RAISE在达到最优对齐效果（GenEval综合得分0.94）的同时，相较于现有缩放方法和反思调优基线，生成样本量减少30-40%，视觉语言模型调用次数降低80%，展现出高效、可泛化且模型无关的多轮自我优化能力。代码已开源：https://github.com/LiyaoJiang1998/RAISE。

English

Recent text-to-image (T2I) diffusion models achieve remarkable realism, yet faithful prompt-image alignment remains challenging, particularly for complex prompts with multiple objects, relations, and fine-grained attributes. Existing training-free inference-time scaling methods rely on fixed iteration budgets that cannot adapt to prompt difficulty, while reflection-tuned models require carefully curated reflection datasets and extensive joint fine-tuning of diffusion and vision-language models, often overfitting to reflection paths data and lacking transferability across models. We introduce RAISE (Requirement-Adaptive Self-Improving Evolution), a training-free, requirement-driven evolutionary framework for adaptive T2I generation. RAISE formulates image generation as a requirement-driven adaptive scaling process, evolving a population of candidates at inference time through a diverse set of refinement actions-including prompt rewriting, noise resampling, and instructional editing. Each generation is verified against a structured checklist of requirements, enabling the system to dynamically identify unsatisfied items and allocate further computation only where needed. This achieves adaptive test-time scaling that aligns computational effort with semantic query complexity. On GenEval and DrawBench, RAISE attains state-of-the-art alignment (0.94 overall GenEval) while incurring fewer generated samples (reduced by 30-40%) and VLM calls (reduced by 80%) than prior scaling and reflection-tuned baselines, demonstrating efficient, generalizable, and model-agnostic multi-round self-improvement. Code is available at https://github.com/LiyaoJiang1998/RAISE.

RAISE：面向免訓練文本到圖像對齊的需求自適應進化優化

RAISE: Requirement-Adaptive Evolutionary Refinement for Training-Free Text-to-Image Alignment

摘要

Support