시각 기억을 활용한 유연한 인식으로 나아가기

초록

신경망을 훈련하는 것은 돌에 지식을 새기는 것과 유사한 거대한 작업입니다: 한 번 프로세스가 완료되면 네트워크의 모든 정보가 가중치에 분산되기 때문에 네트워크의 지식을 편집하는 것은 거의 불가능합니다. 여기서는 딥 신경망의 표현력과 데이터베이스의 유연성을 결합한 간단하고 설득력 있는 대안을 탐구합니다. 이미지 분류 작업을 이미지 유사성(사전 훈련된 임베딩에서) 및 검색(지식 데이터베이스로부터 빠른 최근접 이웃 검색을 통해)으로 분해하여, 우리는 다음과 같은 주요 기능을 갖춘 간단하고 유연한 시각 메모리를 구축합니다: (1.) 개별 샘플부터 전체 클래스 및 십억 단위 데이터까지 다양한 규모의 데이터를 유연하게 추가할 수 있는 능력; (2.) 잊기 및 메모리 가지치기를 통해 데이터를 제거할 수 있는 능력; (3.) 행동을 제어하기 위해 개입할 수 있는 해석 가능한 결정 메커니즘. 이러한 능력을 종합적으로 고려하면 명시적 시각 메모리의 이점을 체계적으로 입증합니다. 우리는 이것이 깊은 비전 모델에서 지식을 어떻게 표현해야 하는지에 대한 대화에 기여할 수 있기를 희망합니다 -- "돌" 가중치에 새기는 것을 넘어서.

English

Training a neural network is a monolithic endeavor, akin to carving knowledge into stone: once the process is completed, editing the knowledge in a network is nearly impossible, since all information is distributed across the network's weights. We here explore a simple, compelling alternative by marrying the representational power of deep neural networks with the flexibility of a database. Decomposing the task of image classification into image similarity (from a pre-trained embedding) and search (via fast nearest neighbor retrieval from a knowledge database), we build a simple and flexible visual memory that has the following key capabilities: (1.) The ability to flexibly add data across scales: from individual samples all the way to entire classes and billion-scale data; (2.) The ability to remove data through unlearning and memory pruning; (3.) An interpretable decision-mechanism on which we can intervene to control its behavior. Taken together, these capabilities comprehensively demonstrate the benefits of an explicit visual memory. We hope that it might contribute to a conversation on how knowledge should be represented in deep vision models -- beyond carving it in ``stone'' weights.

시각 기억을 활용한 유연한 인식으로 나아가기

Towards flexible perception with visual memory

초록

Support