촉각으로 보기: 촉각 기반 물질 영역의 시각적 위치 인식

초록

우리는 촉각 입력과 동일한 재질 특성을 공유하는 이미지 영역을 식별하는 것을 목표로 하는 촉각 위치 인식 문제를 다룬다. 기존의 시각-촉각 방법들은 전역 정렬에 의존하므로 이 작업에 필요한 세분화된 지역적 대응 관계를 포착하지 못한다. 기존 데이터셋이 주로 클로즈업 및 낮은 다양성의 이미지로 구성되어 있어 이 문제는 더욱 복잡해진다. 우리는 조밀한 교차 양상 특징 상호작용을 통해 지역적 시각-촉각 정렬을 학습하고, 접촉 조건 재질 분할을 위한 촉각 salient 맵을 생성하는 모델을 제안한다. 데이터셋의 한계를 극복하기 위해 우리는 (i) 시각적 다양성을 확장하는 실제 환경의 다중 재질 장면 이미지와, (ii) 각 촉각 샘플을 시각적으로 다양하지만 촉각적으로 일관된 이미지와 정렬시키는 재질 다양성 페어링 전략을 도입하여 문맥적 위치 인식과 약한 신호에 대한 강건성을 향상시켰다. 또한 정량적 평가를 위해 두 가지 새로운 촉각 기반 재질 분할 데이터셋을 구축하였다. 새로운 벤치마크와 기존 벤치마크 모두에서의 실험을 통해 우리의 접근 방식이 촉각 위치 인식에서 기존 시각-촉각 방법들을 크게 능가함을 보여준다.

English

We address the problem of tactile localization, where the goal is to identify image regions that share the same material properties as a tactile input. Existing visuo-tactile methods rely on global alignment and thus fail to capture the fine-grained local correspondences required for this task. The challenge is amplified by existing datasets, which predominantly contain close-up, low-diversity images. We propose a model that learns local visuo-tactile alignment via dense cross-modal feature interactions, producing tactile saliency maps for touch-conditioned material segmentation. To overcome dataset constraints, we introduce: (i) in-the-wild multi-material scene images that expand visual diversity, and (ii) a material-diversity pairing strategy that aligns each tactile sample with visually varied yet tactilely consistent images, improving contextual localization and robustness to weak signals. We also construct two new tactile-grounded material segmentation datasets for quantitative evaluation. Experiments on both new and existing benchmarks show that our approach substantially outperforms prior visuo-tactile methods in tactile localization.

촉각으로 보기: 촉각 기반 물질 영역의 시각적 위치 인식

Seeing Through Touch: Tactile-Driven Visual Localization of Material Regions

초록

Support