UniX:统一自回归与扩散模型,实现胸部X光片的理解与生成
UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation
January 16, 2026
作者: Ruiheng Zhang, Jingfeng Yao, Huangxuan Zhao, Hao Yan, Xiao He, Lei Chen, Zhou Wei, Yong Luo, Zengmao Wang, Lefei Zhang, Dacheng Tao, Bo Du
cs.AI
摘要
尽管近期取得进展,医疗基础模型在统一视觉理解与生成任务方面仍面临挑战,因为这两项任务具有本质上的目标冲突:语义抽象与像素级重建。现有基于参数共享自回归架构的方法往往导致其中一项或两项任务性能受损。为此,我们提出新一代统一医疗基础模型UniX,用于胸部X光片的理解与生成。UniX将两项任务解耦为理解任务的自回归分支和生成任务的高保真扩散分支,关键之处在于引入跨模态自注意力机制,通过理解特征动态引导生成过程。结合严格的数据清洗流程与多阶段训练策略,该架构在充分发挥扩散模型生成优势的同时,实现了任务间的协同合作。在两个代表性基准测试中,UniX仅使用LLM-CXR四分之一参数量,即在理解性能(Micro-F1)上提升46.1%,生成质量(FD-RadDino)上提升24.2%。通过达到与专用模型相当的性能,我们的工作为协同式医学图像理解与生成建立了可扩展范式。代码与模型已开源:https://github.com/ZrH42/UniX。
English
Despite recent progress, medical foundation models still struggle to unify visual understanding and generation, as these tasks have inherently conflicting goals: semantic abstraction versus pixel-level reconstruction. Existing approaches, typically based on parameter-shared autoregressive architectures, frequently lead to compromised performance in one or both tasks. To address this, we present UniX, a next-generation unified medical foundation model for chest X-ray understanding and generation. UniX decouples the two tasks into an autoregressive branch for understanding and a diffusion branch for high-fidelity generation. Crucially, a cross-modal self-attention mechanism is introduced to dynamically guide the generation process with understanding features. Coupled with a rigorous data cleaning pipeline and a multi-stage training strategy, this architecture enables synergistic collaboration between tasks while leveraging the strengths of diffusion models for superior generation. On two representative benchmarks, UniX achieves a 46.1% improvement in understanding performance (Micro-F1) and a 24.2% gain in generation quality (FD-RadDino), using only a quarter of the parameters of LLM-CXR. By achieving performance on par with task-specific models, our work establishes a scalable paradigm for synergistic medical image understanding and generation. Codes and models are available at https://github.com/ZrH42/UniX.