Seg-Zero:基於認知強化的推理鏈引導分割
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement
March 9, 2025
作者: Yuqi Liu, Bohao Peng, Zhisheng Zhong, Zihao Yue, Fanbin Lu, Bei Yu, Jiaya Jia
cs.AI
摘要
傳統的推理分割方法依賴於帶有類別標籤和簡單描述的監督微調,這限制了其跨領域的泛化能力,並且缺乏明確的推理過程。為了解決這些限制,我們提出了Seg-Zero,這是一個新穎的框架,展示了卓越的泛化能力,並通過認知強化推導出明確的思維鏈推理。Seg-Zero引入了一種解耦的架構,由推理模型和分割模型組成。推理模型解釋用戶意圖,生成明確的推理鏈,並產生位置提示,這些提示隨後被分割模型用來生成精確的像素級掩碼。我們設計了一種複雜的獎勵機制,整合了格式和準確性獎勵,以有效指導優化方向。Seg-Zero僅通過GRPO的強化學習進行訓練,無需顯式的推理數據,實現了強大的零樣本泛化,並展現了測試時的推理能力。實驗表明,Seg-Zero-7B在ReasonSeg基準測試中達到了57.5的零樣本性能,超過了之前的LISA-7B 18%。這一顯著的改進凸顯了Seg-Zero在跨領域泛化能力的同時,呈現出明確的推理過程。代碼可在https://github.com/dvlab-research/Seg-Zero獲取。
English
Traditional methods for reasoning segmentation rely on supervised fine-tuning
with categorical labels and simple descriptions, limiting its out-of-domain
generalization and lacking explicit reasoning processes. To address these
limitations, we propose Seg-Zero, a novel framework that demonstrates
remarkable generalizability and derives explicit chain-of-thought reasoning
through cognitive reinforcement. Seg-Zero introduces a decoupled architecture
consisting of a reasoning model and a segmentation model. The reasoning model
interprets user intentions, generates explicit reasoning chains, and produces
positional prompts, which are subsequently used by the segmentation model to
generate precious pixel-level masks. We design a sophisticated reward mechanism
that integrates both format and accuracy rewards to effectively guide
optimization directions. Trained exclusively via reinforcement learning with
GRPO and without explicit reasoning data, Seg-Zero achieves robust zero-shot
generalization and exhibits emergent test-time reasoning capabilities.
Experiments show that Seg-Zero-7B achieves a zero-shot performance of 57.5 on
the ReasonSeg benchmark, surpassing the prior LISA-7B by 18\%. This significant
improvement highlights Seg-Zero's ability to generalize across domains while
presenting an explicit reasoning process. Code is available at
https://github.com/dvlab-research/Seg-Zero.Summary
AI-Generated Summary