ChatPaper.aiChatPaper

UniX:統一自回歸與擴散模型於胸部X光影像的理解與生成

UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation

January 16, 2026
作者: Ruiheng Zhang, Jingfeng Yao, Huangxuan Zhao, Hao Yan, Xiao He, Lei Chen, Zhou Wei, Yong Luo, Zengmao Wang, Lefei Zhang, Dacheng Tao, Bo Du
cs.AI

摘要

儘管近期取得進展,醫學基礎模型在統一視覺理解與生成任務方面仍面臨挑戰,因為這兩項任務存在本質目標衝突:語義抽象與像素級重建。現有方法通常基於參數共享的自回歸架構,往往導致其中一項或兩項任務的性能受損。為解決此問題,我們提出新一代統一醫學基礎模型UniX,用於胸部X射線的理解與生成。UniX將兩項任務解耦為自回歸分支負責理解任務,擴散分支實現高保真生成。關鍵在於引入跨模態自注意力機制,能動態利用理解特徵引導生成過程。結合嚴謹的數據清洗流程與多階段訓練策略,該架構在充分發揮擴散模型生成優勢的同時,實現任務間的協同合作。在兩個代表性基準測試中,UniX僅使用LLM-CXR四分之一參數量,即實現理解性能(Micro-F1)提升46.1%,生成質量(FD-RadDino)提高24.2%。通過達到與專用模型相當的性能,本研究為協同醫學影像理解與生成建立了可擴展範式。程式碼與模型已開源於:https://github.com/ZrH42/UniX。
English
Despite recent progress, medical foundation models still struggle to unify visual understanding and generation, as these tasks have inherently conflicting goals: semantic abstraction versus pixel-level reconstruction. Existing approaches, typically based on parameter-shared autoregressive architectures, frequently lead to compromised performance in one or both tasks. To address this, we present UniX, a next-generation unified medical foundation model for chest X-ray understanding and generation. UniX decouples the two tasks into an autoregressive branch for understanding and a diffusion branch for high-fidelity generation. Crucially, a cross-modal self-attention mechanism is introduced to dynamically guide the generation process with understanding features. Coupled with a rigorous data cleaning pipeline and a multi-stage training strategy, this architecture enables synergistic collaboration between tasks while leveraging the strengths of diffusion models for superior generation. On two representative benchmarks, UniX achieves a 46.1% improvement in understanding performance (Micro-F1) and a 24.2% gain in generation quality (FD-RadDino), using only a quarter of the parameters of LLM-CXR. By achieving performance on par with task-specific models, our work establishes a scalable paradigm for synergistic medical image understanding and generation. Codes and models are available at https://github.com/ZrH42/UniX.
PDF151January 22, 2026