视觉嵌入的排序能力研究

摘要

我们探究视觉嵌入模型是否能够捕捉沿线性方向的连续有序属性，我们称之为_排序轴_。若将嵌入投影到此类轴上能保持属性的顺序，则定义该模型对该属性具有_可排序性_。通过对7种流行编码器和9个数据集（涵盖年龄、人群数量、头部姿态、美学及时效性等属性）的研究，我们发现许多嵌入本质上具备可排序性。令人惊讶的是，少量样本，甚至仅需两个极端示例，往往足以恢复有意义的排序轴，而无需大规模监督。这些发现为向量数据库中的图像排序开辟了新的应用场景，并激励我们进一步研究可排序嵌入的结构与学习机制。我们的代码已发布于https://github.com/aktsonthalia/rankable-vision-embeddings。

English

We study whether visual embedding models capture continuous, ordinal attributes along linear directions, which we term _rank axes_. We define a model as _rankable_ for an attribute if projecting embeddings onto such an axis preserves the attribute's order. Across 7 popular encoders and 9 datasets with attributes like age, crowd count, head pose, aesthetics, and recency, we find that many embeddings are inherently rankable. Surprisingly, a small number of samples, or even just two extreme examples, often suffice to recover meaningful rank axes, without full-scale supervision. These findings open up new use cases for image ranking in vector databases and motivate further study into the structure and learning of rankable embeddings. Our code is available at https://github.com/aktsonthalia/rankable-vision-embeddings.

视觉嵌入的排序能力研究

On the rankability of visual embeddings

摘要

Support