시각적 임베딩의 순위화 가능성에 관하여

초록

우리는 시각 임베딩 모델이 _순위 축(rank axes)_이라 명명한 선형 방향을 따라 연속적이고 순서적인 속성을 포착하는지 연구한다. 특정 속성에 대해 임베딩을 이러한 축에 투영했을 때 속성의 순서가 보존된다면, 그 모델을 해당 속성에 대해 _순위화 가능(rankable)_하다고 정의한다. 나이, 군중 수, 머리 포즈, 미학, 최신성과 같은 속성을 가진 9개의 데이터셋과 7개의 인기 있는 인코더를 대상으로 분석한 결과, 많은 임베딩이 본질적으로 순위화 가능함을 발견했다. 놀랍게도, 소수의 샘플 또는 단 두 개의 극단적인 예시만으로도 의미 있는 순위 축을 복원할 수 있으며, 이는 대규모 감독 없이도 가능하다. 이러한 발견은 벡터 데이터베이스에서 이미지 순위화의 새로운 활용 가능성을 열어주며, 순위화 가능한 임베딩의 구조와 학습에 대한 추가 연구를 촉진한다. 우리의 코드는 https://github.com/aktsonthalia/rankable-vision-embeddings에서 확인할 수 있다.

English

We study whether visual embedding models capture continuous, ordinal attributes along linear directions, which we term _rank axes_. We define a model as _rankable_ for an attribute if projecting embeddings onto such an axis preserves the attribute's order. Across 7 popular encoders and 9 datasets with attributes like age, crowd count, head pose, aesthetics, and recency, we find that many embeddings are inherently rankable. Surprisingly, a small number of samples, or even just two extreme examples, often suffice to recover meaningful rank axes, without full-scale supervision. These findings open up new use cases for image ranking in vector databases and motivate further study into the structure and learning of rankable embeddings. Our code is available at https://github.com/aktsonthalia/rankable-vision-embeddings.

시각적 임베딩의 순위화 가능성에 관하여

On the rankability of visual embeddings

초록

Support