視覚的記憶を活用した柔軟な知覚システムの構築に向けて

要旨

ニューラルネットワークの訓練は、知識を石に刻むような単一的な取り組みである。一度プロセスが完了すると、ネットワーク内の知識を編集することはほぼ不可能となる。なぜなら、すべての情報がネットワークの重みに分散されているからだ。ここでは、ディープニューラルネットワークの表現力とデータベースの柔軟性を組み合わせた、シンプルで魅力的な代替案を探る。画像分類のタスクを、画像の類似性（事前学習済みの埋め込みから）と検索（知識データベースからの高速な最近傍探索による）に分解することで、以下の主要な機能を持つシンプルで柔軟な視覚的メモリを構築する：(1.) 個々のサンプルからクラス全体、さらには数十億規模のデータまで、スケールを問わず柔軟にデータを追加する能力；(2.) アンラーニングやメモリの剪定を通じてデータを削除する能力；(3.) その挙動を制御するために介入可能な解釈可能な意思決定メカニズム。これらの機能を総合することで、明示的な視覚的メモリの利点を包括的に示す。これが、ディープビジョンモデルにおいて知識がどのように表現されるべきかという議論に貢献し、「石」のような重みに刻むことを超えた新しいアプローチを促すことを願っている。

English

Training a neural network is a monolithic endeavor, akin to carving knowledge into stone: once the process is completed, editing the knowledge in a network is nearly impossible, since all information is distributed across the network's weights. We here explore a simple, compelling alternative by marrying the representational power of deep neural networks with the flexibility of a database. Decomposing the task of image classification into image similarity (from a pre-trained embedding) and search (via fast nearest neighbor retrieval from a knowledge database), we build a simple and flexible visual memory that has the following key capabilities: (1.) The ability to flexibly add data across scales: from individual samples all the way to entire classes and billion-scale data; (2.) The ability to remove data through unlearning and memory pruning; (3.) An interpretable decision-mechanism on which we can intervene to control its behavior. Taken together, these capabilities comprehensively demonstrate the benefits of an explicit visual memory. We hope that it might contribute to a conversation on how knowledge should be represented in deep vision models -- beyond carving it in ``stone'' weights.

視覚的記憶を活用した柔軟な知覚システムの構築に向けて

Towards flexible perception with visual memory

要旨

Support