オブジェクト中心学習は終わったのか？

要旨

オブジェクト中心学習（OCL）は、シーン内の他のオブジェクトや背景の手がかりから隔離された、オブジェクトのみをエンコードする表現を学習することを目指しています。このアプローチは、分布外（OOD）一般化、サンプル効率の良い合成、構造化された環境のモデリングなど、さまざまな目的を支えています。これまでの研究の多くは、表現空間内でオブジェクトを離散的なスロットに分離する教師なしメカニズムの開発に焦点を当て、教師なしオブジェクト発見を用いて評価されてきました。しかし、最近のサンプル効率の良いセグメンテーションモデルを用いることで、ピクセル空間でオブジェクトを分離し、独立してエンコードすることが可能になりました。これにより、OODオブジェクト発見ベンチマークで驚異的なゼロショット性能を達成し、基盤モデルにスケーラブルであり、変動するスロット数をそのまま扱うことができます。したがって、OCL手法の目的であるオブジェクト中心の表現を獲得するという目標は、ほぼ達成されたと言えます。しかし、この進歩にもかかわらず、重要な疑問が残っています：シーン内のオブジェクトを分離する能力が、OOD一般化などのより広範なOCLの目的にどのように貢献するのか？私たちは、OCLの視点を通じて、誤った背景の手がかりによって引き起こされるOOD一般化の課題を調査することで、この疑問に取り組みます。私たちは、Object-Centric Classification with Applied Masks（OCCAM）と呼ばれる新しい、トレーニング不要のプローブを提案し、個々のオブジェクトのセグメンテーションベースのエンコーディングが、スロットベースのOCL手法を大幅に上回ることを示します。しかし、実世界のアプリケーションにおける課題は依然として残っています。私たちは、OCLコミュニティがスケーラブルなオブジェクト中心の表現を使用するためのツールボックスを提供し、実用的なアプリケーションや、人間の認知におけるオブジェクト知覚の理解などの基本的な問題に焦点を当てます。私たちのコードはhttps://github.com/AlexanderRubinstein/OCCAM{こちら}で利用可能です。

English

Object-centric learning (OCL) seeks to learn representations that only encode an object, isolated from other objects or background cues in a scene. This approach underpins various aims, including out-of-distribution (OOD) generalization, sample-efficient composition, and modeling of structured environments. Most research has focused on developing unsupervised mechanisms that separate objects into discrete slots in the representation space, evaluated using unsupervised object discovery. However, with recent sample-efficient segmentation models, we can separate objects in the pixel space and encode them independently. This achieves remarkable zero-shot performance on OOD object discovery benchmarks, is scalable to foundation models, and can handle a variable number of slots out-of-the-box. Hence, the goal of OCL methods to obtain object-centric representations has been largely achieved. Despite this progress, a key question remains: How does the ability to separate objects within a scene contribute to broader OCL objectives, such as OOD generalization? We address this by investigating the OOD generalization challenge caused by spurious background cues through the lens of OCL. We propose a novel, training-free probe called Object-Centric Classification with Applied Masks (OCCAM), demonstrating that segmentation-based encoding of individual objects significantly outperforms slot-based OCL methods. However, challenges in real-world applications remain. We provide the toolbox for the OCL community to use scalable object-centric representations, and focus on practical applications and fundamental questions, such as understanding object perception in human cognition. Our code is available https://github.com/AlexanderRubinstein/OCCAM{here}.

オブジェクト中心学習は終わったのか？

Are We Done with Object-Centric Learning?

要旨

Support