代理尤度推定器を用いたスケーラブルな推論時アニーリング

要旨

計算化学と生物物理学における長年の課題は、分子のボルツマン分布を効率的にサンプリングすることである。従来のサンプリング手法の限界に対処するため、シミュレーションの計算コストを排除する生成モデリングの進歩が提案されている。有望な方向性として、温度ラダーに沿って拡散モデルを反復的に微調整する手法があり、推論時のアニーリング中にインポータンスサンプリングを介して学習データを生成する。残念ながら、これらの手法ではインポータンス重みを推定するためにスコア場の発散計算が必要であり、大規模系では実行不可能となる。本稿では、スケーラブルな推論時アニーリング（SITA）を提案する。これは、フローベースモデルを再学習して、エネルギーベースモデルを用いることでプロキシとなる高速な尤度計算を活用し、徐々に低温でのサンプルを生成するものである。アラニンジペプチドおよびアラニントリペプチドにおいて、高コストな発散項を回避しつつ、最先端の性能を示す。コードは https://github.com/countrsignal/sita.git で公開している。

English

A long standing challenge in computational chemistry and biophysics is efficiently sampling the Boltzmann distribution of molecules. Advances in generative modeling have been proposed to address the limitations of conventional sampling techniques by eliminating the computational cost of simulation. A promising direction is iteratively finetuning diffusion models along a temperature ladder whereby training data is generated via importance sampling during inference-time annealing. Unfortunately, these methods require computing a divergence over the score field to estimate importance weights, rendering them intractable for larger systems. Here we present scalable inference-time annealing (SITA), which retrains flow-based models to generate samples at progressively lower temperatures using an energy-based model to facilitate fast surrogate likelihoods. We demonstrate state-of-the-art performance on both Alanine Dipeptide and Alanine Tripeptide while avoiding costly divergence terms. Our code is available at https://github.com/countrsignal/sita.git