朝向具有視覺記憶的靈活知覺前進
Towards flexible perception with visual memory
August 15, 2024
作者: Robert Geirhos, Priyank Jaini, Austin Stone, Sourabh Medapati, Xi Yi, George Toderici, Abhijit Ogale, Jonathon Shlens
cs.AI
摘要
訓練神經網絡是一項龐大的工作,類似將知識刻在石頭上:一旦過程完成,編輯網絡中的知識幾乎是不可能的,因為所有信息都分佈在網絡的權重中。我們在這裡探索了一個簡單而引人入勝的替代方案,將深度神經網絡的表徵能力與數據庫的靈活性結合起來。將圖像分類任務分解為圖像相似性(從預先訓練的嵌入中)和搜索(通過從知識數據庫中快速查找最近鄰)兩部分,我們構建了一個簡單而靈活的視覺記憶,具有以下關鍵能力:(1.)能夠靈活地跨尺度添加數據:從單個樣本到整個類別以及數十億規模的數據;(2.)通過取消學習和記憶修剪來刪除數據;(3.)一個可解釋的決策機制,我們可以介入以控制其行為。綜合這些能力,全面展示了明確視覺記憶的好處。我們希望這將有助於討論深度視覺模型中知識應如何表示的議題——超越將其刻在“石頭”權重中。
English
Training a neural network is a monolithic endeavor, akin to carving knowledge
into stone: once the process is completed, editing the knowledge in a network
is nearly impossible, since all information is distributed across the network's
weights. We here explore a simple, compelling alternative by marrying the
representational power of deep neural networks with the flexibility of a
database. Decomposing the task of image classification into image similarity
(from a pre-trained embedding) and search (via fast nearest neighbor retrieval
from a knowledge database), we build a simple and flexible visual memory that
has the following key capabilities: (1.) The ability to flexibly add data
across scales: from individual samples all the way to entire classes and
billion-scale data; (2.) The ability to remove data through unlearning and
memory pruning; (3.) An interpretable decision-mechanism on which we can
intervene to control its behavior. Taken together, these capabilities
comprehensively demonstrate the benefits of an explicit visual memory. We hope
that it might contribute to a conversation on how knowledge should be
represented in deep vision models -- beyond carving it in ``stone'' weights.Summary
AI-Generated Summary