Dataflow-Guided Neuro-Symbolic Language Models for Type Inference

Gen Li, Yao Wan, Hongyu Zhang, Zhou Zhao, Wenbin Jiang, Xuanhua Shi, Hai Jin, Zheng Wang
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:36000-36020, 2025.

Abstract

Language Models (LMs) are increasingly used for type inference, aiding in error detection and software development. Some real-world deployments of LMs require the model to run on local machines to safeguard the intellectual property of the source code. This setting often limits the size of the LMs that can be used. We present Nester, the first neuro-symbolic approach that enhances LMs for type inference by integrating symbolic learning without increasing model size. Nester breaks type inference into sub-tasks based on the data and control flow of the input code, encoding them as a modular high-level program. This program executes multi-step actions, such as evaluating expressions and analyzing conditional branches of the target code, combining static typing with LMs to infer potential types. Evaluated on the ManyTypes4Py dataset in Python, Nester outperforms two state-of-the-art type inference methods (HiTyper and TypeGen), achieving 70.7% Top-1 Exact Match, which is 18.3% and 3.6% higher than HiTyper and TypeGen, respectively. For complex type annotations like typing.Optional and typing.Union, Nester achieves 51.0% and 16.7%, surpassing TypeGen by 28.3% and 5.8%.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-li25cj, title = {Dataflow-Guided Neuro-Symbolic Language Models for Type Inference}, author = {Li, Gen and Wan, Yao and Zhang, Hongyu and Zhao, Zhou and Jiang, Wenbin and Shi, Xuanhua and Jin, Hai and Wang, Zheng}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {36000--36020}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/li25cj/li25cj.pdf}, url = {https://proceedings.mlr.press/v267/li25cj.html}, abstract = {Language Models (LMs) are increasingly used for type inference, aiding in error detection and software development. Some real-world deployments of LMs require the model to run on local machines to safeguard the intellectual property of the source code. This setting often limits the size of the LMs that can be used. We present Nester, the first neuro-symbolic approach that enhances LMs for type inference by integrating symbolic learning without increasing model size. Nester breaks type inference into sub-tasks based on the data and control flow of the input code, encoding them as a modular high-level program. This program executes multi-step actions, such as evaluating expressions and analyzing conditional branches of the target code, combining static typing with LMs to infer potential types. Evaluated on the ManyTypes4Py dataset in Python, Nester outperforms two state-of-the-art type inference methods (HiTyper and TypeGen), achieving 70.7% Top-1 Exact Match, which is 18.3% and 3.6% higher than HiTyper and TypeGen, respectively. For complex type annotations like typing.Optional and typing.Union, Nester achieves 51.0% and 16.7%, surpassing TypeGen by 28.3% and 5.8%.} }
Endnote
%0 Conference Paper %T Dataflow-Guided Neuro-Symbolic Language Models for Type Inference %A Gen Li %A Yao Wan %A Hongyu Zhang %A Zhou Zhao %A Wenbin Jiang %A Xuanhua Shi %A Hai Jin %A Zheng Wang %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-li25cj %I PMLR %P 36000--36020 %U https://proceedings.mlr.press/v267/li25cj.html %V 267 %X Language Models (LMs) are increasingly used for type inference, aiding in error detection and software development. Some real-world deployments of LMs require the model to run on local machines to safeguard the intellectual property of the source code. This setting often limits the size of the LMs that can be used. We present Nester, the first neuro-symbolic approach that enhances LMs for type inference by integrating symbolic learning without increasing model size. Nester breaks type inference into sub-tasks based on the data and control flow of the input code, encoding them as a modular high-level program. This program executes multi-step actions, such as evaluating expressions and analyzing conditional branches of the target code, combining static typing with LMs to infer potential types. Evaluated on the ManyTypes4Py dataset in Python, Nester outperforms two state-of-the-art type inference methods (HiTyper and TypeGen), achieving 70.7% Top-1 Exact Match, which is 18.3% and 3.6% higher than HiTyper and TypeGen, respectively. For complex type annotations like typing.Optional and typing.Union, Nester achieves 51.0% and 16.7%, surpassing TypeGen by 28.3% and 5.8%.
APA
Li, G., Wan, Y., Zhang, H., Zhao, Z., Jiang, W., Shi, X., Jin, H. & Wang, Z.. (2025). Dataflow-Guided Neuro-Symbolic Language Models for Type Inference. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:36000-36020 Available from https://proceedings.mlr.press/v267/li25cj.html.

Related Material