銀河のトークナイザーガイド：科学基盤モデルのベンチマーク

要旨

トークン化は、Transformerベースの基盤モデルに科学データを適応させる上で中心的な役割を果たすが、学習された表現への影響は十分に理解されていない。我々は、天体画像を対象とした統一的なTransformerフレームワーク内で、Affine、AIM、JetFormer、VQ-VAEの4つのトークン化戦略を比較する。DESI Legacy Surveyからの64万個の銀河画像と共通のAstroPTバックボーンを用いて、各手法を再構成品質と物理的特性の予測の観点から評価する。結果は、手法間のトレードオフを明らかにする。フローベースのJetFormerは高い再構成品質を達成する一方、VQ-VAEは銀河の物理的特性に対して強力なプローブ性能を示す。AffineとAIMは局所的な形態情報をよりよく保存する。再構成と表現の品質は切り離されており、ここで検討したタスク全体で一貫して最良の性能を発揮する単一の手法は存在しないことがわかる。本研究は、独立に測定された物理量に基づいて評価を行うことで、科学データが基盤モデル向けの解釈可能なベンチマーク構築の基盤として持つ可能性を強調する一助となることを期待する。

English

Tokenization is central to adapting scientific data for transformer-based foundation models, yet its impact on learned representations remains poorly understood. We compare four tokenization strategies, Affine, AIM, JetFormer, and VQ-VAE, within a unified transformer framework for astronomical imaging. Using 640,000 galaxy images from the DESI Legacy Survey and a shared AstroPT backbone, we evaluate each method on reconstruction fidelity and prediction of physical properties. Our results reveal trade-offs across approaches. The flow-based JetFormer achieves higher reconstruction quality, while VQ-VAE yields strong probe performance for galaxy physical properties. Affine and AIM better preserve localized morphological information. We find that reconstruction and representation quality are decoupled, and no single method consistently performs best across the tasks considered here. By grounding our evaluation in independently measured physical quantities, we hope this study serves to highlight the potential of scientific data as a basis for constructing interpretable benchmarks for foundation models.