ChatPaper.aiChatPaper

X-Reasoner:邁向跨模態與跨領域的通用推理能力

X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains

May 6, 2025
作者: Qianchu Liu, Sheng Zhang, Guanghui Qin, Timothy Ossowski, Yu Gu, Ying Jin, Sid Kiblawi, Sam Preston, Mu Wei, Paul Vozila, Tristan Naumann, Hoifung Poon
cs.AI

摘要

近期專有模型(例如o3)已開始展現出強大的多模態推理能力。然而,現有的大多數開源研究仍集中於訓練僅限於文本的推理模型,其評估範圍主要限於數學和通用領域任務。因此,如何有效將推理能力擴展至文本輸入和通用領域之外,仍是一個未解之謎。本文探討了一個基礎研究問題:推理能力是否能在不同模態和領域間通用?我們的研究結果支持了肯定的答案:基於通用領域文本的後續訓練能夠實現這種強大的通用推理能力。基於這一發現,我們提出了X-Reasoner,這是一個僅在通用領域文本上進行後續訓練的視覺語言模型,旨在實現通用推理能力,採用兩階段方法:首先進行監督式微調,使用蒸餾的長鏈思維,隨後進行帶有可驗證獎勵的強化學習。實驗表明,X-Reasoner成功將推理能力轉移至多模態和領域外設置,在各種通用和醫學基準測試中,超越了現有使用領域內和多模態數據訓練的最先進模型(圖1)。此外,我們發現,通過在特定領域的純文本數據上進行持續訓練,可以進一步提升X-Reasoner在專業領域的表現。基於此,我們推出了X-Reasoner-Med,這是一個專注於醫學的變體,在多個純文本和多模態醫學基準測試中達到了新的最高水平。
English
Recent proprietary models (e.g., o3) have begun to demonstrate strong multimodal reasoning capabilities. Yet, most existing open-source research concentrates on training text-only reasoning models, with evaluations limited to mainly mathematical and general-domain tasks. Therefore, it remains unclear how to effectively extend reasoning capabilities beyond text input and general domains. This paper explores a fundamental research question: Is reasoning generalizable across modalities and domains? Our findings support an affirmative answer: General-domain text-based post-training can enable such strong generalizable reasoning. Leveraging this finding, we introduce X-Reasoner, a vision-language model post-trained solely on general-domain text for generalizable reasoning, using a two-stage approach: an initial supervised fine-tuning phase with distilled long chain-of-thoughts, followed by reinforcement learning with verifiable rewards. Experiments show that X-Reasoner successfully transfers reasoning capabilities to both multimodal and out-of-domain settings, outperforming existing state-of-the-art models trained with in-domain and multimodal data across various general and medical benchmarks (Figure 1). Additionally, we find that X-Reasoner's performance in specialized domains can be further enhanced through continued training on domain-specific text-only data. Building upon this, we introduce X-Reasoner-Med, a medical-specialized variant that achieves new state of the art on numerous text-only and multimodal medical benchmarks.

Summary

AI-Generated Summary

PDF82May 9, 2025