迈向具有视觉记忆的灵活感知

摘要

训练神经网络是一项庞大的工作，类似于将知识刻在石头上：一旦完成了这个过程，编辑网络中的知识几乎是不可能的，因为所有信息都分布在网络的权重中。我们在这里探讨了一个简单而引人注目的替代方案，即将深度神经网络的表征能力与数据库的灵活性相结合。将图像分类任务分解为图像相似度（从预训练嵌入中）和搜索（通过从知识数据库中快速检索最近邻）两部分，我们构建了一个简单而灵活的视觉记忆，具有以下关键能力：（1.）能够灵活地跨越各种规模添加数据：从单个样本到整个类别以及十亿级数据；（2.）通过遗忘和记忆修剪来删除数据的能力；（3.）一个可解释的决策机制，我们可以干预以控制其行为。综合来看，这些能力全面展示了显式视觉记忆的好处。我们希望这可能有助于探讨在深度视觉模型中应如何表示知识，超越了将其刻在“石头”权重中的方式。

English

Training a neural network is a monolithic endeavor, akin to carving knowledge into stone: once the process is completed, editing the knowledge in a network is nearly impossible, since all information is distributed across the network's weights. We here explore a simple, compelling alternative by marrying the representational power of deep neural networks with the flexibility of a database. Decomposing the task of image classification into image similarity (from a pre-trained embedding) and search (via fast nearest neighbor retrieval from a knowledge database), we build a simple and flexible visual memory that has the following key capabilities: (1.) The ability to flexibly add data across scales: from individual samples all the way to entire classes and billion-scale data; (2.) The ability to remove data through unlearning and memory pruning; (3.) An interpretable decision-mechanism on which we can intervene to control its behavior. Taken together, these capabilities comprehensively demonstrate the benefits of an explicit visual memory. We hope that it might contribute to a conversation on how knowledge should be represented in deep vision models -- beyond carving it in ``stone'' weights.

迈向具有视觉记忆的灵活感知

Towards flexible perception with visual memory

摘要

Support