自适应剪枝增强鲁棒性并降低高斯过程加速鞍点搜索中的计算开销

摘要

高斯过程（GP）回归提供了一种策略，通过减少能量及其对原子坐标的导数需要被评估的次数，来加速高维能量面上的鞍点搜索。然而，超参数优化中的计算开销可能很大，使得该方法效率低下。如果搜索过于深入那些GP模型未能充分表示的区域，也可能导致失败。本文通过采用几何感知的最优传输度量以及一种主动剪枝策略解决了这些挑战，该策略在远点采样中对每种原子类型的Wasserstein-1距离求和，选取固定大小的几何多样性构型子集，以避免随着观测次数增加GP更新成本迅速上升。通过引入置换不变度量增强了稳定性，该度量为早停提供了可靠的信任半径，并对信号方差的增长施加了对数障碍惩罚。这些基于物理启发的算法改进在238个来自先前发布的化学反应数据集的挑战性构型上，将平均计算时间减少至不到一半，证明了其有效性。通过这些改进，GP方法被确立为一种稳健且可扩展的算法，适用于在能量和原子力评估需要大量计算努力时加速鞍点搜索。

English

Gaussian process (GP) regression provides a strategy for accelerating saddle point searches on high-dimensional energy surfaces by reducing the number of times the energy and its derivatives with respect to atomic coordinates need to be evaluated. The computational overhead in the hyperparameter optimization can, however, be large and make the approach inefficient. Failures can also occur if the search ventures too far into regions that are not represented well enough by the GP model. Here, these challenges are resolved by using geometry-aware optimal transport measures and an active pruning strategy using a summation over Wasserstein-1 distances for each atom-type in farthest-point sampling, selecting a fixed-size subset of geometrically diverse configurations to avoid rapidly increasing cost of GP updates as more observations are made. Stability is enhanced by permutation-invariant metric that provides a reliable trust radius for early-stopping and a logarithmic barrier penalty for the growth of the signal variance. These physically motivated algorithmic changes prove their efficacy by reducing to less than a half the mean computational time on a set of 238 challenging configurations from a previously published data set of chemical reactions. With these improvements, the GP approach is established as, a robust and scalable algorithm for accelerating saddle point searches when the evaluation of the energy and atomic forces requires significant computational effort.

自适应剪枝增强鲁棒性并降低高斯过程加速鞍点搜索中的计算开销

Adaptive Pruning for Increased Robustness and Reduced Computational Overhead in Gaussian Process Accelerated Saddle Point Searches

摘要

Support