朝向具有視覺記憶的靈活知覺前進

摘要

訓練神經網絡是一項龐大的工作，類似將知識刻在石頭上：一旦過程完成，編輯網絡中的知識幾乎是不可能的，因為所有信息都分佈在網絡的權重中。我們在這裡探索了一個簡單而引人入勝的替代方案，將深度神經網絡的表徵能力與數據庫的靈活性結合起來。將圖像分類任務分解為圖像相似性（從預先訓練的嵌入中）和搜索（通過從知識數據庫中快速查找最近鄰）兩部分，我們構建了一個簡單而靈活的視覺記憶，具有以下關鍵能力：（1.）能夠靈活地跨尺度添加數據：從單個樣本到整個類別以及數十億規模的數據；（2.）通過取消學習和記憶修剪來刪除數據；（3.）一個可解釋的決策機制，我們可以介入以控制其行為。綜合這些能力，全面展示了明確視覺記憶的好處。我們希望這將有助於討論深度視覺模型中知識應如何表示的議題——超越將其刻在“石頭”權重中。

English

Training a neural network is a monolithic endeavor, akin to carving knowledge into stone: once the process is completed, editing the knowledge in a network is nearly impossible, since all information is distributed across the network's weights. We here explore a simple, compelling alternative by marrying the representational power of deep neural networks with the flexibility of a database. Decomposing the task of image classification into image similarity (from a pre-trained embedding) and search (via fast nearest neighbor retrieval from a knowledge database), we build a simple and flexible visual memory that has the following key capabilities: (1.) The ability to flexibly add data across scales: from individual samples all the way to entire classes and billion-scale data; (2.) The ability to remove data through unlearning and memory pruning; (3.) An interpretable decision-mechanism on which we can intervene to control its behavior. Taken together, these capabilities comprehensively demonstrate the benefits of an explicit visual memory. We hope that it might contribute to a conversation on how knowledge should be represented in deep vision models -- beyond carving it in ``stone'' weights.

朝向具有視覺記憶的靈活知覺前進

Towards flexible perception with visual memory

摘要

Support