創薬を超えて：ナノテクノロジー分子最適化(NMO)ベンチマーク

要旨

生成的分子設計は、薬物様特性に関する単純な代理ベンチマークや大規模医薬品データセットで事前学習されたモデルによって形成されている。この組み合わせは強力なベンチマーク指標をもたらす一方で、創薬とは構造的に異なる領域への転移可能性を制限している。この限界を克服し、現実の科学的根拠に基づくターゲットへと探索を導くために、我々は機械学習と量子材料科学を橋渡しする「ナノテクノロジー分子最適化（NMO）ベンチマーク」を導入する。NMOは機械学習コミュニティにとって厳格なテストベッドであると同時に、ナノテクノロジー研究のための発見エンジンとして機能する。このスイートは代理オラクルを量子シミュレーションに置き換え、リーダーボード指向の過学習よりも科学的実用性を優先する厳格なプロトコルを導入する。物理に基づくNMOタスクは、厳しい構造的制約と険しい適合度ランドスケープを課し、生成モデルに根本的に新しい要件を突きつける。特筆すべきは、高度な分子最適化手法がNMOタスクにおいて、はるかに単純な手法よりも劣るパフォーマンスを示す点である。我々は、構造的制約をモデル化するための新規表現や、医薬品データセットのバイアスを排除するためのドメイン非依存型事前学習戦略を含む、NMOタスクを解決するための重要なコンポーネントを特定する新たなベースライン手法を開発する。結果は最先端の物性値を上回り、これまで未知であった構造モチーフを明らかにすることで、ナノテクノロジーコミュニティに新たな知見を提供し、機械学習が真の科学的発見を推進できることを示す。

English

Generative molecular design is shaped by simple proxy benchmarks for drug-like properties and models pretrained on large pharmaceutical datasets. This combination yields strong benchmark metrics but limits transferability to domains structurally distinct from drug discovery. To overcome this limitation and drive discovery toward real, scientifically grounded targets, we introduce the Nanotechnology Molecular Optimization (NMO) Benchmark, which bridges machine learning (ML) and quantum materials science. NMO acts simultaneously as a rigorous testbed for the ML community and a discovery engine for nanotechnology research. The suite replaces proxy oracles with quantum simulations and introduces strict protocols that prioritize scientific utility over leaderboard-oriented overfitting. The physics-based NMO tasks impose hard structural constraints and rugged fitness landscapes, posing fundamentally new requirements on generative models. Notably, advanced molecular optimization methods underperform much simpler approaches on the NMO tasks. We develop a new baseline method identifying the critical components to solve the NMO tasks, including a novel representation for modeling structural constraints and a domain-agnostic pretraining strategy to eliminate pharmaceutical dataset bias. Our results surpass state-of-the-art physical properties and reveal previously unknown structural motifs, offering new insights for the nanotechnology community and demonstrating that ML can drive genuine scientific discovery.