MC-RFM: Geometriebewuste Few-Shot-Adaptatie via Gemengde-Kromming Riemanniaanse Stroommatching

Samenvatting

Parameter-efficiënte aanpassing van voorgetrainde visiemodellen wordt gewoonlijk uitgevoerd via lineaire probes, prompts, laagrangige updates of lichtgewicht residumodules. Hoewel effectief, behandelen deze methoden aanpassing doorgaans als een discrete Euclidische perturbatie van bevroren representaties, zonder expliciet de geometrie van de taakgeïnduceerde verplaatsing van kenmerken te modelleren. Wij stellen MC-RFM voor, een Riemanniaans stroommatchingkader met gemengde kromming voor few-shot aanpassing van bevroren visuele backbone-modellen. Het kernidee is om aangepaste kenmerken te representeren op een productvariëteit die een hyperbolische factor, die hiërarchiegevoelige semantische structuur vastlegt, combineert met een Euclidische factor, die lokaal discriminerende visuele variatie behoudt. Aanpassing wordt geformuleerd als een taakgeconditioneerd continu transport van bevroren kenmerken naar ondersteuningsset-prototypes, getraind met een stroommatchingdoelstelling en gekoppeld aan een hybride prototype-lineaire classifier. De methode is lichtgewicht, backbone-agnostisch, en werkt volledig op gecachte bevroren kenmerken. Over zeven visuele herkenningsbenchmarks, vijf bevroren backbones, en 1/4/16-shot regimes, is MC-RFM de best presterende methode in een meerderheid van de geëvalueerde instellingen, met de sterkste winst op Transformer backbones en fijnmazige datasets. Ablatiestudies tonen aan dat de kop met gemengde kromming, taakconditionering, adaptieve takpoort, prototypekrimping en discriminerende supervisie elk bijdragen aan de prestatie. Deze resultaten suggereren dat few-shot aanpassing niet alleen baat heeft bij het beslissen welke parameters te updaten, maar ook bij het modelleren hoe representaties moeten bewegen door een geometrie die is afgestemd op de structuur van de downstream-taak.

English

Parameter-efficient adaptation of pretrained vision models is commonly performed through linear probes, prompts, low-rank updates, or lightweight residual modules. While effective, these methods usually treat adaptation as a discrete Euclidean perturbation of frozen representations, without explicitly modeling the geometry of the task-induced feature displacement. We propose MC-RFM, a mixed-curvature Riemannian flow-matching framework for few-shot adaptation of frozen visual backbones. The key idea is to represent adapted features on a product manifold combining a hyperbolic factor, which captures hierarchy-sensitive semantic structure, and a Euclidean factor, which preserves locally discriminative visual variation. Adaptation is formulated as a task-conditioned continuous transport from frozen features to support-set prototypes, trained with a flow-matching objective and coupled to a hybrid prototype-linear classifier. The method is lightweight, backbone-agnostic, and operates entirely on cached frozen features. Across seven visual recognition benchmarks, five frozen backbones, and 1/4/16-shot regimes, MC-RFM is the best-performing method in a majority of evaluated settings, with the strongest gains on Transformer backbones and fine-grained datasets. Ablations show that the mixed-curvature head, task conditioning, adaptive branch gating, prototype shrinkage, and discriminative supervision each contribute to performance. These results suggest that few-shot adaptation benefits not only from deciding which parameters to update, but also from modeling how representations should move through a geometry matched to the structure of the downstream task.