使用代理似然估計器的可擴展推理時間退火

摘要

在計算化學與生物物理學領域中，一項長期存在的挑戰是高效地對分子的玻爾茲曼分佈進行採樣。生成式建模的進展被提出用於克服傳統採樣技術的限制，通過消除模擬的計算成本。一個有前景的方向是沿著溫度階梯迭代微調擴散模型，其中訓練數據是在推論時退火過程中透過重要性抽樣生成的。不幸的是，這些方法需在分數場上計算散度來估計重要性權重，使其難以應用於較大的系統。在此，我們提出可擴展推論時退火（SITA），該方法重新訓練基於流量的模型，以在逐步降低的溫度下生成樣本，並利用能量基模型來促進快速的替代似然計算。我們在丙胺酸二肽與丙胺酸三肽上展示了最先進的性能，同時避免了昂貴的散度項。我們的代碼可在 https://github.com/countrsignal/sita.git 取得。

English

A long standing challenge in computational chemistry and biophysics is efficiently sampling the Boltzmann distribution of molecules. Advances in generative modeling have been proposed to address the limitations of conventional sampling techniques by eliminating the computational cost of simulation. A promising direction is iteratively finetuning diffusion models along a temperature ladder whereby training data is generated via importance sampling during inference-time annealing. Unfortunately, these methods require computing a divergence over the score field to estimate importance weights, rendering them intractable for larger systems. Here we present scalable inference-time annealing (SITA), which retrains flow-based models to generate samples at progressively lower temperatures using an energy-based model to facilitate fast surrogate likelihoods. We demonstrate state-of-the-art performance on both Alanine Dipeptide and Alanine Tripeptide while avoiding costly divergence terms. Our code is available at https://github.com/countrsignal/sita.git