MACRO：利用结构化长上下文数据推进多参考图像生成技术

摘要

基于多视觉参考生成图像对于多主体组合、叙事插画及新视角合成等实际应用至关重要，然而当前模型在输入参考数量增加时会出现严重的性能退化。我们发现其根本原因在于数据瓶颈：现有数据集主要由单参考或少量参考对主导，缺乏能够学习密集参考间依赖关系的结构化长上下文监督。为此，我们推出MacroData——一个包含40万样本的大规模数据集，每个样本最多包含10张参考图像，并系统性地按定制化、插画生成、空间推理和时间动态四个互补维度进行组织，全面覆盖多参考生成场景。针对当前缺乏标准化评估体系的问题，我们进一步提出MacroBench基准，包含4,000个样本，通过分级任务维度和输入规模评估生成连贯性。大量实验表明，基于MacroData的微调能显著提升多参考生成性能，消融研究还揭示了跨任务协同训练的协同效益以及处理长上下文复杂度的有效策略。数据集与基准将公开发布。

English

Generating images conditioned on multiple visual references is critical for real-world applications such as multi-subject composition, narrative illustration, and novel view synthesis, yet current models suffer from severe performance degradation as the number of input references grows. We identify the root cause as a fundamental data bottleneck: existing datasets are dominated by single- or few-reference pairs and lack the structured, long-context supervision needed to learn dense inter-reference dependencies. To address this, we introduce MacroData, a large-scale dataset of 400K samples, each containing up to 10 reference images, systematically organized across four complementary dimensions -- Customization, Illustration, Spatial reasoning, and Temporal dynamics -- to provide comprehensive coverage of the multi-reference generation space. Recognizing the concurrent absence of standardized evaluation protocols, we further propose MacroBench, a benchmark of 4,000 samples that assesses generative coherence across graded task dimensions and input scales. Extensive experiments show that fine-tuning on MacroData yields substantial improvements in multi-reference generation, and ablation studies further reveal synergistic benefits of cross-task co-training and effective strategies for handling long-context complexity. The dataset and benchmark will be publicly released.

MACRO：利用结构化长上下文数据推进多参考图像生成技术

MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data

摘要

Support