αDepth: ステレオ変換のための単一パスソフト境界分解の学習

要旨

軟らかい境界（例：髪の毛や焦点ぼけ）を正確にモデル化することは、ステレオ変換における基本的な課題であり、前景と背景の曖昧な混合が原因となる。既存の深度モデルは主に単一層の深度を予測するため、軟らかい境界における深度対応に曖昧さが生じる。マット手法は層状モデリングのための不透明度を捉えることができるが、複数の対象を含む複雑なシーンでは困難を伴い、通常はユーザーの介入を必要とする。本稿では、軟らかい境界を分解する層状表現であるαDepthを導入し、高忠実度のステレオ変換を実現する。具体的には、まず軟らかい境界における混合された色と深度の曖昧さを、層状の色と深度値を推定することで解消する。複雑な複数対象シーンに対処するため、円形アルファ表現（CAR）を設計し、グローバルな対象抽出から局所的な境界分解へとパラダイムを転換する。従来のマット手法は単一の前景/背景に制限されていたのに対し、CARは手動ガイダンスなしで効率的なシーンレベルの推論を可能にする。広範な評価により、αDepthはステレオ変換において最先端の性能を達成し、軟らかい境界における背景の滲みや構造的歪みを除去することが示された。

English

Accurately modeling soft boundaries, e.g., hair and defocus blur, is a fundamental challenge in stereo conversion due to the ambiguous blending of foreground and background. Existing depth models primarily predict single-layer depth, leading to ambiguity in depth correspondence at soft boundaries. While matting techniques can capture opacity for layered modeling, they often struggle in complex scenes with multiple targets and usually require user intervention. This paper introduces αDepth, a layered representation that decomposes soft boundaries for high-fidelity stereo conversion. Specifically, we first resolve mixed color and depth ambiguity by estimating layered color and depth values at soft boundaries. Considering complex multi-target scenes, we design a Circular Alpha Representation (CAR) that shifts the paradigm from global target extraction to local boundary decomposition. Unlike prior matting methods restricted to a single foreground/background, CAR enables efficient scene-level inference without manual guidance. Extensive evaluations demonstrate that αDepth achieves state-of-the-art performance in stereo conversion, eliminating background bleeding and structural distortions at soft boundaries.