ChatPaper.aiChatPaper

朝向具有視覺記憶的靈活知覺前進

Towards flexible perception with visual memory

August 15, 2024
作者: Robert Geirhos, Priyank Jaini, Austin Stone, Sourabh Medapati, Xi Yi, George Toderici, Abhijit Ogale, Jonathon Shlens
cs.AI

摘要

訓練神經網絡是一項龐大的工作,類似將知識刻在石頭上:一旦過程完成,編輯網絡中的知識幾乎是不可能的,因為所有信息都分佈在網絡的權重中。我們在這裡探索了一個簡單而引人入勝的替代方案,將深度神經網絡的表徵能力與數據庫的靈活性結合起來。將圖像分類任務分解為圖像相似性(從預先訓練的嵌入中)和搜索(通過從知識數據庫中快速查找最近鄰)兩部分,我們構建了一個簡單而靈活的視覺記憶,具有以下關鍵能力:(1.)能夠靈活地跨尺度添加數據:從單個樣本到整個類別以及數十億規模的數據;(2.)通過取消學習和記憶修剪來刪除數據;(3.)一個可解釋的決策機制,我們可以介入以控制其行為。綜合這些能力,全面展示了明確視覺記憶的好處。我們希望這將有助於討論深度視覺模型中知識應如何表示的議題——超越將其刻在“石頭”權重中。
English
Training a neural network is a monolithic endeavor, akin to carving knowledge into stone: once the process is completed, editing the knowledge in a network is nearly impossible, since all information is distributed across the network's weights. We here explore a simple, compelling alternative by marrying the representational power of deep neural networks with the flexibility of a database. Decomposing the task of image classification into image similarity (from a pre-trained embedding) and search (via fast nearest neighbor retrieval from a knowledge database), we build a simple and flexible visual memory that has the following key capabilities: (1.) The ability to flexibly add data across scales: from individual samples all the way to entire classes and billion-scale data; (2.) The ability to remove data through unlearning and memory pruning; (3.) An interpretable decision-mechanism on which we can intervene to control its behavior. Taken together, these capabilities comprehensively demonstrate the benefits of an explicit visual memory. We hope that it might contribute to a conversation on how knowledge should be represented in deep vision models -- beyond carving it in ``stone'' weights.

Summary

AI-Generated Summary

PDF243November 26, 2024