MACRO：利用結構化長上下文數據推進多參考圖像生成

摘要

基於多視覺參考生成圖像對於多主體構圖、敘事插畫和新視角合成等現實應用至關重要，然而當前模型在輸入參考數量增加時會出現嚴重的性能衰退。我們發現根本原因在於數據瓶頸：現有數據集以單一或少數參考對為主，缺乏學習密集參考間依賴關係所需的結構化長上下文監督。為解決此問題，我們推出MacroData大規模數據集，包含40萬個樣本，每個樣本最多涵蓋10張參考圖像，並按四個互補維度——個性化定制、插畫藝術、空間推理和時序動態——系統化組織，全面覆蓋多參考生成場景。針對當前缺乏標準化評估機制的現狀，我們進一步提出MacroBench基準測試，包含4,000個樣本，從漸進式任務維度和輸入規模兩個維度評估生成連貫性。大量實驗表明，基於MacroData的微調能顯著提升多參考生成效果，消融研究還揭示了跨任務協同訓練的增效作用以及處理長上下文複雜度的有效策略。本數據集與基準測試將公開釋出。

English

Generating images conditioned on multiple visual references is critical for real-world applications such as multi-subject composition, narrative illustration, and novel view synthesis, yet current models suffer from severe performance degradation as the number of input references grows. We identify the root cause as a fundamental data bottleneck: existing datasets are dominated by single- or few-reference pairs and lack the structured, long-context supervision needed to learn dense inter-reference dependencies. To address this, we introduce MacroData, a large-scale dataset of 400K samples, each containing up to 10 reference images, systematically organized across four complementary dimensions -- Customization, Illustration, Spatial reasoning, and Temporal dynamics -- to provide comprehensive coverage of the multi-reference generation space. Recognizing the concurrent absence of standardized evaluation protocols, we further propose MacroBench, a benchmark of 4,000 samples that assesses generative coherence across graded task dimensions and input scales. Extensive experiments show that fine-tuning on MacroData yields substantial improvements in multi-reference generation, and ablation studies further reveal synergistic benefits of cross-task co-training and effective strategies for handling long-context complexity. The dataset and benchmark will be publicly released.

MACRO：利用結構化長上下文數據推進多參考圖像生成

MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data

摘要

Support