點式轉換器 V3：更簡單、更快、更強

摘要

本文並非為了尋求關注機制內的創新而動機。相反地，它專注於克服現有點雲處理背景下準確性和效率之間的折衷，利用規模的威力。從最近在3D大規模表示學習方面的進展中汲取靈感，我們認識到模型性能更多受規模而非精細設計的影響。因此，我們提出了Point Transformer V3（PTv3），它將簡潔和效率置於優先位置，而不是準確性，這些機制在經過擴展後對整體性能的影響較小，例如將精確的鄰居搜索替換為具有特定模式組織的點雲的高效序列化鄰居映射。這一原則實現了顯著的擴展，將感知域從16擴展到1024個點，同時保持效率（與其前身PTv2相比，處理速度增加了3倍，記憶效率提高了10倍）。PTv3在涵蓋室內和室外場景的20多個下游任務中取得了最先進的結果。通過多數據集聯合訓練進一步增強，PTv3將這些結果推向更高水平。

English

This paper is not motivated to seek innovation within the attention mechanism. Instead, it focuses on overcoming the existing trade-offs between accuracy and efficiency within the context of point cloud processing, leveraging the power of scale. Drawing inspiration from recent advances in 3D large-scale representation learning, we recognize that model performance is more influenced by scale than by intricate design. Therefore, we present Point Transformer V3 (PTv3), which prioritizes simplicity and efficiency over the accuracy of certain mechanisms that are minor to the overall performance after scaling, such as replacing the precise neighbor search by KNN with an efficient serialized neighbor mapping of point clouds organized with specific patterns. This principle enables significant scaling, expanding the receptive field from 16 to 1024 points while remaining efficient (a 3x increase in processing speed and a 10x improvement in memory efficiency compared with its predecessor, PTv2). PTv3 attains state-of-the-art results on over 20 downstream tasks that span both indoor and outdoor scenarios. Further enhanced with multi-dataset joint training, PTv3 pushes these results to a higher level.

點式轉換器 V3：更簡單、更快、更強

Point Transformer V3: Simpler, Faster, Stronger

摘要

Support