可轉移且原則性的效率,用於開放詞彙分割
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
April 11, 2024
作者: Jingxuan Xu, Wuyang Chen, Yao Zhao, Yunchao Wei
cs.AI
摘要
最近預訓練基礎視覺語言模型的成功使得開放詞彙分割(OVS)成為可能。儘管這種方法表現出令人期待的性能,但卻引入了兩個挑戰所帶來的沉重計算負擔:1)骨幹模型的巨大尺寸;2)微調過程中的昂貴成本。這些挑戰阻礙了這種OVS策略在現實場景中被廣泛應用並且負擔得起。儘管傳統方法如模型壓縮和高效微調可以應對這些挑戰,但它們常常依賴於經驗法則。這意味著它們的解決方案不能輕易轉移,並需要在不同模型上重新訓練,這將帶來成本。在高效OVS的背景下,我們的目標是通過利用訓練成本較低的較小模型,實現與基於大型視覺語言基礎模型的先前OVS作品相當甚至更好的性能。核心策略是使我們的效率合理化,因此可以毫不費力地從一個OVS框架順利轉移到其他框架,而無需進一步定制。對多樣的OVS基準進行全面實驗,展示了我們在分割準確性和計算成本之間取得的優越折衷,勝過先前的作品。我們的程式碼可在 https://github.com/Xujxyang/OpenTrans 上找到。
English
Recent success of pre-trained foundation vision-language models makes
Open-Vocabulary Segmentation (OVS) possible. Despite the promising performance,
this approach introduces heavy computational overheads for two challenges: 1)
large model sizes of the backbone; 2) expensive costs during the fine-tuning.
These challenges hinder this OVS strategy from being widely applicable and
affordable in real-world scenarios. Although traditional methods such as model
compression and efficient fine-tuning can address these challenges, they often
rely on heuristics. This means that their solutions cannot be easily
transferred and necessitate re-training on different models, which comes at a
cost. In the context of efficient OVS, we target achieving performance that is
comparable to or even better than prior OVS works based on large
vision-language foundation models, by utilizing smaller models that incur lower
training costs. The core strategy is to make our efficiency principled and thus
seamlessly transferable from one OVS framework to others without further
customization. Comprehensive experiments on diverse OVS benchmarks demonstrate
our superior trade-off between segmentation accuracy and computation costs over
previous works. Our code is available on https://github.com/Xujxyang/OpenTransSummary
AI-Generated Summary