期刊名称:Proceedings of the National Academy of Sciences
印刷版ISSN:0027-8424
电子版ISSN:1091-6490
出版年度:2022
卷号:119
期号:27
DOI:10.1073/pnas.2120333119
语种:English
出版社:The National Academy of Sciences of the United States of America
摘要:Significance
Machine learning is revolutionizing computational chemistry by greatly reducing the computational difficulty of many simulations performed by computational chemists while maintaining accuracies of 1 kcal/mol or better. A major challenge in this field is addressing the poor extensibility and transferability of conventional machine-learning (ML) models, which result in degraded accuracy when applying these models to large or new chemical systems. To build a more general and interpretable model, we incorporate a quantum chemistry framework into the deep neural network, resulting in an interpretable Hamiltonian-based model with markedly high training efficiency. We validate this method on multiple large biochemical molecules by predicting various properties with consistently high accuracies, indicating the model is both extensible and transferable.
Conventional machine-learning (ML) models in computational chemistry learn to directly predict molecular properties using quantum chemistry only for reference data. While these heuristic ML methods show quantum-level accuracy with speeds several orders of magnitude faster than traditional quantum chemistry methods, they suffer from poor extensibility and transferability; i.e., their accuracy degrades on large or new chemical systems. Incorporating quantum chemistry frameworks into the ML models directly solves this problem. Here we take the structure of semiempirical quantum mechanics (SEQM) methods to construct dynamically responsive Hamiltonians. SEQM methods use empirical parameters fitted to experimental properties to construct reduced-order Hamiltonians, facilitating much faster calculations than ab initio methods but with compromised accuracy. By replacing these static parameters with machine-learned dynamic values inferred from the local environment, we greatly improve the accuracy of the SEQM methods. Trained on molecular energies and atomic forces, these dynamically generated Hamiltonian parameters show a strong correlation with atomic hybridization and bonding. Trained with only about 60,000 small organic molecular conformers, the resulting model retains interpretability, extensibility, and transferability when testing on much larger chemical systems and predicting various molecular properties. Overall, this work demonstrates the virtues of incorporating physics-based descriptions with ML to develop models that are simultaneously accurate, transferable, and interpretable.