基于替代似然估计器的可扩展推理时退火

摘要

计算化学与生物物理学中一个长期存在的挑战，是如何高效地对分子的玻尔兹曼分布进行采样。生成式建模的进展被提出用于解决传统采样技术的局限性，通过消除模拟计算成本来突破瓶颈。一个具有前景的方向是沿温度梯度迭代微调扩散模型，在推理阶段退火过程中通过重要性采样生成训练数据。然而，这些方法需要计算评分场上的散度来估计重要性权重，导致其在较大系统中难以处理。本文提出可扩展推理阶段退火（SITA），该方法通过重新训练基于流的模型，逐步在更低温度下生成样本，并利用基于能量的模型实现快速代理似然。我们在丙氨酸二肽和丙氨酸三肽任务上展现了最先进的性能，同时避免了计算成本高昂的散度项。我们的代码已开源：https://github.com/countrsignal/sita.git

English

A long standing challenge in computational chemistry and biophysics is efficiently sampling the Boltzmann distribution of molecules. Advances in generative modeling have been proposed to address the limitations of conventional sampling techniques by eliminating the computational cost of simulation. A promising direction is iteratively finetuning diffusion models along a temperature ladder whereby training data is generated via importance sampling during inference-time annealing. Unfortunately, these methods require computing a divergence over the score field to estimate importance weights, rendering them intractable for larger systems. Here we present scalable inference-time annealing (SITA), which retrains flow-based models to generate samples at progressively lower temperatures using an energy-based model to facilitate fast surrogate likelihoods. We demonstrate state-of-the-art performance on both Alanine Dipeptide and Alanine Tripeptide while avoiding costly divergence terms. Our code is available at https://github.com/countrsignal/sita.git