隐式神经表征促进统一通用视觉编码
Implicit Neural Representation Facilitates Unified Universal Vision Encoding
January 20, 2026
作者: Matthew Gwilliam, Xiao Wang, Xuefeng Hu, Zhenheng Yang
cs.AI
摘要
图像表征学习模型通常专为识别或生成任务而设计。对比学习的多种形式帮助模型学会将图像转换为适用于分类、检测和分割任务的嵌入向量;而通过结合像素级、感知和对抗性损失训练模型进行图像重建,则可学习适用于图像生成的潜在空间。我们试图通过首开先河的模型统一这两个方向,使学习到的表征能同时适用于识别与生成任务。我们将模型训练为隐式神经表征的超网络,使其学会将图像映射为模型权重,从而实现快速精准的重建。我们进一步将INR超网络与知识蒸馏相结合,以提升其泛化能力与性能。除创新的训练设计外,该模型还学习了前所未有的压缩嵌入空间,在各种视觉任务中表现卓越。该完整模型在图像表征学习领域达到业界领先水平,同时通过高质量微型嵌入实现生成能力。代码已开源:https://github.com/tiktok/huvr。
English
Models for image representation learning are typically designed for either recognition or generation. Various forms of contrastive learning help models learn to convert images to embeddings that are useful for classification, detection, and segmentation. On the other hand, models can be trained to reconstruct images with pixel-wise, perceptual, and adversarial losses in order to learn a latent space that is useful for image generation. We seek to unify these two directions with a first-of-its-kind model that learns representations which are simultaneously useful for recognition and generation. We train our model as a hyper-network for implicit neural representation, which learns to map images to model weights for fast, accurate reconstruction. We further integrate our INR hyper-network with knowledge distillation to improve its generalization and performance. Beyond the novel training design, the model also learns an unprecedented compressed embedding space with outstanding performance for various visual tasks. The complete model competes with state-of-the-art results for image representation learning, while also enabling generative capabilities with its high-quality tiny embeddings. The code is available at https://github.com/tiktok/huvr.