視覚的埋め込みのランク付け可能性について

要旨

視覚埋め込みモデルが連続的で順序的な属性を線形方向に捉えているかどうかを研究し、これを_ランク軸_と呼びます。ある属性について、埋め込みをそのような軸に投影した際に属性の順序が保持される場合、そのモデルをその属性に対して_ランク可能_と定義します。年齢、群衆数、頭部姿勢、美的評価、新しさといった属性を持つ9つのデータセットと7つの人気エンコーダーを対象に調査した結果、多くの埋め込みが本質的にランク可能であることがわかりました。驚くべきことに、少数のサンプル、またはたった2つの極端な例だけで、大規模な教師なしでも意味のあるランク軸を回復できることがしばしばあります。これらの発見は、ベクトルデータベースにおける画像ランキングの新しいユースケースを開拓し、ランク可能な埋め込みの構造と学習に関するさらなる研究を促すものです。私たちのコードはhttps://github.com/aktsonthalia/rankable-vision-embeddingsで公開されています。

English

We study whether visual embedding models capture continuous, ordinal attributes along linear directions, which we term _rank axes_. We define a model as _rankable_ for an attribute if projecting embeddings onto such an axis preserves the attribute's order. Across 7 popular encoders and 9 datasets with attributes like age, crowd count, head pose, aesthetics, and recency, we find that many embeddings are inherently rankable. Surprisingly, a small number of samples, or even just two extreme examples, often suffice to recover meaningful rank axes, without full-scale supervision. These findings open up new use cases for image ranking in vector databases and motivate further study into the structure and learning of rankable embeddings. Our code is available at https://github.com/aktsonthalia/rankable-vision-embeddings.

視覚的埋め込みのランク付け可能性について

On the rankability of visual embeddings

要旨

Support