Seg-Zero: 認知強化による推論チェーンガイド型セグメンテーション

要旨

従来の推論セグメンテーション手法は、カテゴリカルラベルと簡易な記述を用いた教師ありファインチューニングに依存しており、ドメイン外への汎化能力が制限され、明示的な推論プロセスを欠いていました。これらの課題を解決するため、我々はSeg-Zeroを提案します。これは、認知的な強化を通じて顕著な汎化能力を示し、明示的な連鎖的思考推論を導出する新しいフレームワークです。Seg-Zeroは、推論モデルとセグメンテーションモデルからなる分離型アーキテクチャを導入しています。推論モデルはユーザーの意図を解釈し、明示的な推論連鎖を生成し、位置情報プロンプトを生成します。これらはその後、セグメンテーションモデルによって使用され、精密なピクセルレベルのマスクを生成します。我々は、フォーマットと精度の両方の報酬を統合した洗練された報酬メカニズムを設計し、最適化の方向性を効果的に導きます。GRPOを用いた強化学習のみでトレーニングされ、明示的な推論データを使用しないSeg-Zeroは、堅牢なゼロショット汎化能力を示し、テスト時の推論能力を発現させます。実験結果では、Seg-Zero-7BはReasonSegベンチマークで57.5のゼロショット性能を達成し、従来のLISA-7Bを18%上回りました。この大幅な改善は、Seg-Zeroが明示的な推論プロセスを提示しながらドメインを超えて汎化する能力を強調しています。コードはhttps://github.com/dvlab-research/Seg-Zeroで公開されています。

English

Traditional methods for reasoning segmentation rely on supervised fine-tuning with categorical labels and simple descriptions, limiting its out-of-domain generalization and lacking explicit reasoning processes. To address these limitations, we propose Seg-Zero, a novel framework that demonstrates remarkable generalizability and derives explicit chain-of-thought reasoning through cognitive reinforcement. Seg-Zero introduces a decoupled architecture consisting of a reasoning model and a segmentation model. The reasoning model interprets user intentions, generates explicit reasoning chains, and produces positional prompts, which are subsequently used by the segmentation model to generate precious pixel-level masks. We design a sophisticated reward mechanism that integrates both format and accuracy rewards to effectively guide optimization directions. Trained exclusively via reinforcement learning with GRPO and without explicit reasoning data, Seg-Zero achieves robust zero-shot generalization and exhibits emergent test-time reasoning capabilities. Experiments show that Seg-Zero-7B achieves a zero-shot performance of 57.5 on the ReasonSeg benchmark, surpassing the prior LISA-7B by 18\%. This significant improvement highlights Seg-Zero's ability to generalize across domains while presenting an explicit reasoning process. Code is available at https://github.com/dvlab-research/Seg-Zero.

Seg-Zero: 認知強化による推論チェーンガイド型セグメンテーション

Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement

要旨

Support