ChatPaper.aiChatPaper

從理想至現實:面向真實場景的統一且數據高效之密集預測

From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios

June 25, 2025
作者: Changliang Xia, Chengyou Jia, Zhuohang Dang, Minnan Luo
cs.AI

摘要

密集预测任务在计算机视觉领域占据着重要地位,其目标是为输入图像学习像素级的标注标签。尽管该领域已取得诸多进展,现有方法主要集中于理想化条件下的研究,对现实场景的泛化能力有限,且面临真实世界数据稀缺的挑战。为系统性地研究这一问题,我们首先引入了DenseWorld,这是一个涵盖25种密集预测任务的基准测试集,这些任务对应着紧迫的现实应用需求,并实现了跨任务的统一评估。随后,我们提出了DenseDiT,它最大限度地利用生成模型的视觉先验,通过统一策略执行多样化的现实世界密集预测任务。DenseDiT结合了参数重用机制和两个轻量级分支,这些分支自适应地整合多尺度上下文信息,仅需增加不到0.1%的参数。在DenseWorld上的评估显示,现有通用及专用基线模型性能显著下降,凸显了它们在现实世界泛化能力上的局限。相比之下,DenseDiT仅使用基线模型不到0.01%的训练数据便取得了优异结果,充分证明了其在现实世界部署中的实用价值。我们的数据、检查点及代码可在https://xcltql666.github.io/DenseDiTProj获取。
English
Dense prediction tasks hold significant importance of computer vision, aiming to learn pixel-wise annotated label for an input image. Despite advances in this field, existing methods primarily focus on idealized conditions, with limited generalization to real-world scenarios and facing the challenging scarcity of real-world data. To systematically study this problem, we first introduce DenseWorld, a benchmark spanning a broad set of 25 dense prediction tasks that correspond to urgent real-world applications, featuring unified evaluation across tasks. Then, we propose DenseDiT, which maximally exploits generative models' visual priors to perform diverse real-world dense prediction tasks through a unified strategy. DenseDiT combines a parameter-reuse mechanism and two lightweight branches that adaptively integrate multi-scale context, working with less than 0.1% additional parameters. Evaluations on DenseWorld reveal significant performance drops in existing general and specialized baselines, highlighting their limited real-world generalization. In contrast, DenseDiT achieves superior results using less than 0.01% training data of baselines, underscoring its practical value for real-world deployment. Our data, and checkpoints and codes are available at https://xcltql666.github.io/DenseDiTProj
PDF171June 30, 2025