대리 우도 추정기를 활용한 확장 가능한 추론 시 어닐링

초록

계산화학과 생물물리학의 오랜 과제는 분자의 볼츠만 분포를 효율적으로 샘플링하는 것입니다. 생성 모델링의 발전은 시뮬레이션의 계산 비용을 제거함으로써 기존 샘플링 기법의 한계를 해결하기 위해 제안되었습니다. 유망한 방향은 온도 사다리를 따라 확산 모델을 반복적으로 미세 조정하는 것으로, 추론 시간 어닐링 중 중요도 샘플링을 통해 훈련 데이터를 생성합니다. 불행히도 이러한 방법은 중요도 가중치를 추정하기 위해 점수 필드에 대한 발산을 계산해야 하므로 더 큰 시스템에서 다루기 어렵게 만듭니다. 여기에서는 확장 가능한 추론 시간 어닐링(SITA)을 제시합니다. 이 방법은 에너지 기반 모델을 사용하여 빠른 대리 우도를 촉진함으로써 점진적으로 더 낮은 온도에서 샘플을 생성하도록 흐름 기반 모델을 재훈련합니다. 우리는 비용이 많이 드는 발산 항을 피하면서 알라닌 이펩타이드와 알라닌 트리펩타이드 모두에서 최첨단 성능을 입증합니다. 우리의 코드는 https://github.com/countrsignal/sita.git에서 확인할 수 있습니다.

English

A long standing challenge in computational chemistry and biophysics is efficiently sampling the Boltzmann distribution of molecules. Advances in generative modeling have been proposed to address the limitations of conventional sampling techniques by eliminating the computational cost of simulation. A promising direction is iteratively finetuning diffusion models along a temperature ladder whereby training data is generated via importance sampling during inference-time annealing. Unfortunately, these methods require computing a divergence over the score field to estimate importance weights, rendering them intractable for larger systems. Here we present scalable inference-time annealing (SITA), which retrains flow-based models to generate samples at progressively lower temperatures using an energy-based model to facilitate fast surrogate likelihoods. We demonstrate state-of-the-art performance on both Alanine Dipeptide and Alanine Tripeptide while avoiding costly divergence terms. Our code is available at https://github.com/countrsignal/sita.git