Fast-FoundationStereo:实时零样本立体匹配
Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching
December 11, 2025
作者: Bowen Wen, Shaurya Dewan, Stan Birchfield
cs.AI
摘要
立体基础模型虽能实现强大的零样本泛化能力,但其计算复杂度仍难以满足实时应用需求。而高效立体架构往往以牺牲鲁棒性换取速度,且需针对不同领域进行昂贵的微调。为弥合这一差距,我们提出Fast-FoundationStereo系列架构,首次在实时帧率下实现强零样本泛化。我们采用分治加速策略,包含三个核心组件:(1)通过知识蒸馏将混合主干网络压缩为单一高效学生模型;(2)采用分块神经架构搜索自动发现时延预算下的最优代价滤波设计,将搜索复杂度指数级降低;(3)通过结构化剪枝消除迭代优化模块中的冗余。此外,我们构建了自动伪标注流程,从真实场景中筛选140万组立体图像对以补充合成训练数据,促进知识蒸馏。最终模型在保持与FoundationStereo相近零样本精度的同时,运行速度提升超10倍,由此确立实时立体方法的新标杆。项目页面:https://nvlabs.github.io/Fast-FoundationStereo/
English
Stereo foundation models achieve strong zero-shot generalization but remain computationally prohibitive for real-time applications. Efficient stereo architectures, on the other hand, sacrifice robustness for speed and require costly per-domain fine-tuning. To bridge this gap, we present Fast-FoundationStereo, a family of architectures that achieve, for the first time, strong zero-shot generalization at real-time frame rate. We employ a divide-and-conquer acceleration strategy with three components: (1) knowledge distillation to compress the hybrid backbone into a single efficient student; (2) blockwise neural architecture search for automatically discovering optimal cost filtering designs under latency budgets, reducing search complexity exponentially; and (3) structured pruning for eliminating redundancy in the iterative refinement module. Furthermore, we introduce an automatic pseudo-labeling pipeline used to curate 1.4M in-the-wild stereo pairs to supplement synthetic training data and facilitate knowledge distillation. The resulting model can run over 10x faster than FoundationStereo while closely matching its zero-shot accuracy, thus establishing a new state-of-the-art among real-time methods. Project page: https://nvlabs.github.io/Fast-FoundationStereo/