迈向具有视觉记忆的灵活感知
Towards flexible perception with visual memory
August 15, 2024
作者: Robert Geirhos, Priyank Jaini, Austin Stone, Sourabh Medapati, Xi Yi, George Toderici, Abhijit Ogale, Jonathon Shlens
cs.AI
摘要
训练神经网络是一项庞大的工作,类似于将知识刻在石头上:一旦完成了这个过程,编辑网络中的知识几乎是不可能的,因为所有信息都分布在网络的权重中。我们在这里探讨了一个简单而引人注目的替代方案,即将深度神经网络的表征能力与数据库的灵活性相结合。将图像分类任务分解为图像相似度(从预训练嵌入中)和搜索(通过从知识数据库中快速检索最近邻)两部分,我们构建了一个简单而灵活的视觉记忆,具有以下关键能力:(1.)能够灵活地跨越各种规模添加数据:从单个样本到整个类别以及十亿级数据;(2.)通过遗忘和记忆修剪来删除数据的能力;(3.)一个可解释的决策机制,我们可以干预以控制其行为。综合来看,这些能力全面展示了显式视觉记忆的好处。我们希望这可能有助于探讨在深度视觉模型中应如何表示知识,超越了将其刻在“石头”权重中的方式。
English
Training a neural network is a monolithic endeavor, akin to carving knowledge
into stone: once the process is completed, editing the knowledge in a network
is nearly impossible, since all information is distributed across the network's
weights. We here explore a simple, compelling alternative by marrying the
representational power of deep neural networks with the flexibility of a
database. Decomposing the task of image classification into image similarity
(from a pre-trained embedding) and search (via fast nearest neighbor retrieval
from a knowledge database), we build a simple and flexible visual memory that
has the following key capabilities: (1.) The ability to flexibly add data
across scales: from individual samples all the way to entire classes and
billion-scale data; (2.) The ability to remove data through unlearning and
memory pruning; (3.) An interpretable decision-mechanism on which we can
intervene to control its behavior. Taken together, these capabilities
comprehensively demonstrate the benefits of an explicit visual memory. We hope
that it might contribute to a conversation on how knowledge should be
represented in deep vision models -- beyond carving it in ``stone'' weights.Summary
AI-Generated Summary