ChatPaper.aiChatPaper

X-Reasoner:迈向跨模态与跨领域的通用推理能力

X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains

May 6, 2025
作者: Qianchu Liu, Sheng Zhang, Guanghui Qin, Timothy Ossowski, Yu Gu, Ying Jin, Sid Kiblawi, Sam Preston, Mu Wei, Paul Vozila, Tristan Naumann, Hoifung Poon
cs.AI

摘要

近期专有模型(如o3)已开始展现出强大的多模态推理能力。然而,现有的大多数开源研究仍集中于训练仅处理文本的推理模型,其评估也主要局限于数学和通用领域任务。因此,如何有效扩展推理能力至文本输入和通用领域之外,仍是一个未解之谜。本文探讨了一个基础研究问题:推理能力是否可跨模态和领域泛化?我们的研究结果给出了肯定的答案:基于通用领域文本的后训练能够实现这种强大的泛化推理能力。基于这一发现,我们提出了X-Reasoner,这是一个仅通过通用领域文本进行后训练的视觉-语言模型,旨在实现泛化推理。我们采用了两阶段方法:首先进行带有蒸馏长链思维的有监督微调,随后通过可验证奖励进行强化学习。实验表明,X-Reasoner成功地将推理能力迁移至多模态及跨域场景,在多种通用和医疗基准测试中,超越了使用领域内和多模态数据训练的最先进模型(见图1)。此外,我们发现,通过在特定领域的纯文本数据上持续训练,X-Reasoner在专业领域的表现可得到进一步提升。基于此,我们推出了X-Reasoner-Med,这是一个医疗专用版本,在众多纯文本和多模态医疗基准测试中创下了新的最高记录。
English
Recent proprietary models (e.g., o3) have begun to demonstrate strong multimodal reasoning capabilities. Yet, most existing open-source research concentrates on training text-only reasoning models, with evaluations limited to mainly mathematical and general-domain tasks. Therefore, it remains unclear how to effectively extend reasoning capabilities beyond text input and general domains. This paper explores a fundamental research question: Is reasoning generalizable across modalities and domains? Our findings support an affirmative answer: General-domain text-based post-training can enable such strong generalizable reasoning. Leveraging this finding, we introduce X-Reasoner, a vision-language model post-trained solely on general-domain text for generalizable reasoning, using a two-stage approach: an initial supervised fine-tuning phase with distilled long chain-of-thoughts, followed by reinforcement learning with verifiable rewards. Experiments show that X-Reasoner successfully transfers reasoning capabilities to both multimodal and out-of-domain settings, outperforming existing state-of-the-art models trained with in-domain and multimodal data across various general and medical benchmarks (Figure 1). Additionally, we find that X-Reasoner's performance in specialized domains can be further enhanced through continued training on domain-specific text-only data. Building upon this, we introduce X-Reasoner-Med, a medical-specialized variant that achieves new state of the art on numerous text-only and multimodal medical benchmarks.

Summary

AI-Generated Summary

PDF72May 9, 2025