ChatPaper.aiChatPaper

Lite Any Stereo V2: 更快更强的高效零样本立体匹配

Lite Any Stereo V2: Faster and Stronger Efficient Zero-Shot Stereo Matching

June 23, 2026
作者: Junpeng Jing, Ronglai Zuo, Zhelun Shen, Shangchen Zhou, Rolandos Alexandros Potamias, Stefanos Zafeiriou, Krystian Mikolajczyk, Jiankang Deng
cs.AI

摘要

最新立体匹配研究虽取得了显著的精度提升,但往往依赖大模型、高计算量或基模型先验,导致难以部署于资源受限平台。相比之下,高效立体模型虽推理速度更快,但通常被认为零样本泛化能力较弱。本文通过提出Lite Any Stereo V2(LAS2)系列超快模型,挑战了这一假设。LAS2从架构与训练两个维度进行优化:架构层面,我们重新审视实际部署场景下的高效立体设计,提出仅基于2D的代价聚合框架,以实际推理延迟而非理论计算量为优化目标;训练层面,我们开发了三阶段策略,融合合成数据监督、自蒸馏与真实世界知识蒸馏。为提升真实世界伪标注的可靠性,进一步引入伪标签过滤与误差限幅操作,实现从合成数据到真实数据的平滑迁移。我们将LAS2实例化为包含多种效率预算的前馈变体与高精度迭代变体的模型家族。大量实验表明,LAS2在保持显著低延迟的同时,达到了高效立体方法中的最优精度。具体而言,LAS2-H在零样本综合性能上超越迭代方法Fast-FoundationStereo,且在H200与Orin平台上分别实现1.8倍和2.7倍更快的推理速度。项目页面、演示与代码见https://tomtomtommi.github.io/LiteAnyStereoV2/。
English
Recent advances in stereo matching have achieved remarkable accuracy, but often rely on large models, heavy computation, or additional foundation-model priors, making them difficult to deploy on resource-constrained platforms. In contrast, efficient stereo models offer faster inference but are commonly considered less capable of strong zero-shot generalization. In this paper, we challenge this assumption by introducing Lite Any Stereo V2 (LAS2), an ultra-fast model series designed for efficient zero-shot stereo matching. LAS2 is developed from both architecture and training perspectives. Architecturally, we revisit efficient stereo design under practical deployment settings and propose a 2D-only cost aggregation framework, optimized for real inference latency rather than theoretical MACs alone. For training, we develop a three-stage strategy that combines synthetic supervision, self-distillation, and real-world knowledge distillation. To improve the reliability of real-world pseudo supervision, we further introduce pseudo-label filtering and an error-clamping operation, enabling smoother synthetic-to-real transfer. We instantiate LAS2 as a family of models, including feed-forward variants for different efficiency budgets and an iterative variant for higher accuracy. Extensive experiments show that LAS2 achieves state-of-the-art accuracy among efficient stereo methods while maintaining significantly lower latency. Specifically, LAS2-H achieves stronger overall zero-shot performance than the iterative method Fast-FoundationStereo, with 1.8x and 2.7x faster inference on H200 and Orin, respectively. The project page, demos, and code are available at https://tomtomtommi.github.io/LiteAnyStereoV2/.