視覺嵌入之可排序性探討

摘要

本研究探討視覺嵌入模型是否能夠沿著線性方向捕捉連續且有序的屬性，我們稱之為_等級軸_。若將嵌入投影到此類軸上能保持屬性的順序，我們便定義該模型對該屬性具有_可排序性_。通過對7種流行編碼器及9個涵蓋年齡、人群數量、頭部姿態、美學評價和新近性等屬性的數據集進行分析，我們發現許多嵌入本質上具備可排序性。令人驚訝的是，僅需少量樣本，甚至僅需兩個極端例子，往往就足以恢復出有意義的等級軸，而無需全面的監督。這些發現為向量數據庫中的圖像排序開闢了新的應用場景，並激勵了對可排序嵌入結構與學習的進一步研究。我們的代碼已公開於https://github.com/aktsonthalia/rankable-vision-embeddings。

English

We study whether visual embedding models capture continuous, ordinal attributes along linear directions, which we term _rank axes_. We define a model as _rankable_ for an attribute if projecting embeddings onto such an axis preserves the attribute's order. Across 7 popular encoders and 9 datasets with attributes like age, crowd count, head pose, aesthetics, and recency, we find that many embeddings are inherently rankable. Surprisingly, a small number of samples, or even just two extreme examples, often suffice to recover meaningful rank axes, without full-scale supervision. These findings open up new use cases for image ranking in vector databases and motivate further study into the structure and learning of rankable embeddings. Our code is available at https://github.com/aktsonthalia/rankable-vision-embeddings.

視覺嵌入之可排序性探討

On the rankability of visual embeddings

摘要

Support