ChatPaper.aiChatPaper

科學影像合成:基準測試、方法論與下游應用價值

Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility

January 17, 2026
作者: Honglin Lin, Chonghan Qin, Zheng Liu, Qizhi Pei, Yu Li, Zhanping Zhong, Xin Gao, Yanfeng Wang, Conghui He, Lijun Wu
cs.AI

摘要

雖然合成資料在提升文字領域的科學推理能力方面已證實有效,但多模態推理仍受制於合成科學嚴謹圖像的困難。現有的文字轉圖像(T2I)模型常產生視覺合理但科學上錯誤的輸出,導致持續存在的視覺邏輯分歧,限制了其在下游推理中的價值。受新一代T2I模型最新進展的啟發,我們針對科學圖像合成的生成範式、評估方法及下游應用展開系統性研究。我們分析了基於像素的直接生成與程式化合成兩種路徑,並提出ImgCoder——一個遵循明確「理解-規劃-編碼」工作流程的邏輯驅動框架,以提升結構精確度。為嚴謹評估科學正確性,我們推出SciGenBench基準,從資訊效用與邏輯有效性兩維度評估生成圖像。實驗結果揭示基於像素模型的系統性失效模式,並凸顯表達力與精確度之間的根本性權衡。最後,我們證實基於經嚴格驗證的合成科學圖像對大型多模態模型(LMM)進行微調,能帶來穩定的推理能力提升,且存在類似文字領域的規模化潛力,這驗證了高保真科學合成作為釋放巨量多模態推理能力的可行路徑。
English
While synthetic data has proven effective for improving scientific reasoning in the text domain, multimodal reasoning remains constrained by the difficulty of synthesizing scientifically rigorous images. Existing Text-to-Image (T2I) models often produce outputs that are visually plausible yet scientifically incorrect, resulting in a persistent visual-logic divergence that limits their value for downstream reasoning. Motivated by recent advances in next-generation T2I models, we conduct a systematic study of scientific image synthesis across generation paradigms, evaluation, and downstream use. We analyze both direct pixel-based generation and programmatic synthesis, and propose ImgCoder, a logic-driven framework that follows an explicit "understand - plan - code" workflow to improve structural precision. To rigorously assess scientific correctness, we introduce SciGenBench, which evaluates generated images based on information utility and logical validity. Our evaluation reveals systematic failure modes in pixel-based models and highlights a fundamental expressiveness-precision trade-off. Finally, we show that fine-tuning Large Multimodal Models (LMMs) on rigorously verified synthetic scientific images yields consistent reasoning gains, with potential scaling trends analogous to the text domain, validating high-fidelity scientific synthesis as a viable path to unlocking massive multimodal reasoning capabilities.
PDF342January 28, 2026