Self-Consistency Training for Density-Functional-Theory Hamiltonian Prediction

He Zhang, Chang Liu, Zun Wang, Xinran Wei, Siyuan Liu, Nanning Zheng, Bin Shao, Tie-Yan Liu
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:59329-59357, 2024.

Abstract

Predicting the mean-field Hamiltonian matrix in density functional theory is a fundamental formulation to leverage machine learning for solving molecular science problems. Yet, its applicability is limited by insufficient labeled data for training. In this work, we highlight that Hamiltonian prediction possesses a self-consistency principle, based on which we propose self-consistency training, an exact training method that does not require labeled data. It distinguishes the task from predicting other molecular properties by the following benefits: (1) it enables the model to be trained on a large amount of unlabeled data, hence addresses the data scarcity challenge and enhances generalization; (2) it is more efficient than running DFT to generate labels for supervised training, since it amortizes DFT calculation over a set of queries. We empirically demonstrate the better generalization in data-scarce and out-of-distribution scenarios, and the better efficiency over DFT labeling. These benefits push forward the applicability of Hamiltonian prediction to an ever-larger scale.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-zhang24ak, title = {Self-Consistency Training for Density-Functional-Theory {H}amiltonian Prediction}, author = {Zhang, He and Liu, Chang and Wang, Zun and Wei, Xinran and Liu, Siyuan and Zheng, Nanning and Shao, Bin and Liu, Tie-Yan}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {59329--59357}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/zhang24ak/zhang24ak.pdf}, url = {https://proceedings.mlr.press/v235/zhang24ak.html}, abstract = {Predicting the mean-field Hamiltonian matrix in density functional theory is a fundamental formulation to leverage machine learning for solving molecular science problems. Yet, its applicability is limited by insufficient labeled data for training. In this work, we highlight that Hamiltonian prediction possesses a self-consistency principle, based on which we propose self-consistency training, an exact training method that does not require labeled data. It distinguishes the task from predicting other molecular properties by the following benefits: (1) it enables the model to be trained on a large amount of unlabeled data, hence addresses the data scarcity challenge and enhances generalization; (2) it is more efficient than running DFT to generate labels for supervised training, since it amortizes DFT calculation over a set of queries. We empirically demonstrate the better generalization in data-scarce and out-of-distribution scenarios, and the better efficiency over DFT labeling. These benefits push forward the applicability of Hamiltonian prediction to an ever-larger scale.} }
Endnote
%0 Conference Paper %T Self-Consistency Training for Density-Functional-Theory Hamiltonian Prediction %A He Zhang %A Chang Liu %A Zun Wang %A Xinran Wei %A Siyuan Liu %A Nanning Zheng %A Bin Shao %A Tie-Yan Liu %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-zhang24ak %I PMLR %P 59329--59357 %U https://proceedings.mlr.press/v235/zhang24ak.html %V 235 %X Predicting the mean-field Hamiltonian matrix in density functional theory is a fundamental formulation to leverage machine learning for solving molecular science problems. Yet, its applicability is limited by insufficient labeled data for training. In this work, we highlight that Hamiltonian prediction possesses a self-consistency principle, based on which we propose self-consistency training, an exact training method that does not require labeled data. It distinguishes the task from predicting other molecular properties by the following benefits: (1) it enables the model to be trained on a large amount of unlabeled data, hence addresses the data scarcity challenge and enhances generalization; (2) it is more efficient than running DFT to generate labels for supervised training, since it amortizes DFT calculation over a set of queries. We empirically demonstrate the better generalization in data-scarce and out-of-distribution scenarios, and the better efficiency over DFT labeling. These benefits push forward the applicability of Hamiltonian prediction to an ever-larger scale.
APA
Zhang, H., Liu, C., Wang, Z., Wei, X., Liu, S., Zheng, N., Shao, B. & Liu, T.. (2024). Self-Consistency Training for Density-Functional-Theory Hamiltonian Prediction. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:59329-59357 Available from https://proceedings.mlr.press/v235/zhang24ak.html.

Related Material