触觉透视：基于触觉驱动的材料区域视觉定位

摘要

本文研究触觉定位问题，其目标在于识别与触觉输入具有相同材质属性的图像区域。现有视觉-触觉方法依赖全局对齐，因而难以捕捉该任务所需的细粒度局部对应关系。现有数据集主要包含特写镜头且多样性不足，进一步加剧了该挑战。我们提出一种通过密集跨模态特征交互学习局部视觉-触觉对齐的模型，可生成用于触觉条件材质分割的触觉显著性图。为克服数据集限制，我们引入：（1）拓展视觉多样性的真实场景多材质图像；（2）材质多样性配对策略，将每个触觉样本与视觉差异显著但触感一致的图像对齐，从而提升上下文定位能力及对弱信号的鲁棒性。我们还构建了两个新的触觉材质分割数据集用于定量评估。在新旧基准测试上的实验表明，本方法在触觉定位任务上显著优于现有视觉-触觉方法。

English

We address the problem of tactile localization, where the goal is to identify image regions that share the same material properties as a tactile input. Existing visuo-tactile methods rely on global alignment and thus fail to capture the fine-grained local correspondences required for this task. The challenge is amplified by existing datasets, which predominantly contain close-up, low-diversity images. We propose a model that learns local visuo-tactile alignment via dense cross-modal feature interactions, producing tactile saliency maps for touch-conditioned material segmentation. To overcome dataset constraints, we introduce: (i) in-the-wild multi-material scene images that expand visual diversity, and (ii) a material-diversity pairing strategy that aligns each tactile sample with visually varied yet tactilely consistent images, improving contextual localization and robustness to weak signals. We also construct two new tactile-grounded material segmentation datasets for quantitative evaluation. Experiments on both new and existing benchmarks show that our approach substantially outperforms prior visuo-tactile methods in tactile localization.

触觉透视：基于触觉驱动的材料区域视觉定位

Seeing Through Touch: Tactile-Driven Visual Localization of Material Regions

摘要

Support