Multi-Level Symbolic Regression: Function Structure Learning for Multi-Level Data

Kei Sen Fong, Mehul Motani
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:2890-2898, 2024.

Abstract

Symbolic Regression (SR) is an approach which learns a closed-form function relating the predictors to the outcome in a dataset. Datasets are often multi-level (MuL), meaning that certain features can be used to split data into groups for analysis (we refer to these features as levels). The advantage of viewing datasets as MuL is that we can exploit the high similarity of data within a group. SR is well-suited for MuL datasets, in which the learnt function structure serves as ‘shared information’ between the groups while the learnt parameter values capture the unique relationships within each group. In this context, this paper makes three contributions: (i) We design an algorithm, Multi-level Symbolic Regression (MSR), which runs multiple parallel SR processes for each group and merges them to produce a single function structure. (ii) To tackle datasets that are not explicitly MuL, we develop a metric termed MLICC to select the best feature to serve as a level. (iii) We also release MSRBench, a database of MuL datasets (synthetic and real-world) which we developed and collated, that can be used to evaluate MSR. Our results and ablation studies demonstrate that MSR achieves a higher recovery rate and lower error on MSRBench compared to SOTA methods for SR and MuL datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-sen-fong24a, title = { Multi-Level Symbolic Regression: Function Structure Learning for Multi-Level Data }, author = {Sen Fong, Kei and Motani, Mehul}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {2890--2898}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/sen-fong24a/sen-fong24a.pdf}, url = {https://proceedings.mlr.press/v238/sen-fong24a.html}, abstract = { Symbolic Regression (SR) is an approach which learns a closed-form function relating the predictors to the outcome in a dataset. Datasets are often multi-level (MuL), meaning that certain features can be used to split data into groups for analysis (we refer to these features as levels). The advantage of viewing datasets as MuL is that we can exploit the high similarity of data within a group. SR is well-suited for MuL datasets, in which the learnt function structure serves as ‘shared information’ between the groups while the learnt parameter values capture the unique relationships within each group. In this context, this paper makes three contributions: (i) We design an algorithm, Multi-level Symbolic Regression (MSR), which runs multiple parallel SR processes for each group and merges them to produce a single function structure. (ii) To tackle datasets that are not explicitly MuL, we develop a metric termed MLICC to select the best feature to serve as a level. (iii) We also release MSRBench, a database of MuL datasets (synthetic and real-world) which we developed and collated, that can be used to evaluate MSR. Our results and ablation studies demonstrate that MSR achieves a higher recovery rate and lower error on MSRBench compared to SOTA methods for SR and MuL datasets. } }
Endnote
%0 Conference Paper %T Multi-Level Symbolic Regression: Function Structure Learning for Multi-Level Data %A Kei Sen Fong %A Mehul Motani %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-sen-fong24a %I PMLR %P 2890--2898 %U https://proceedings.mlr.press/v238/sen-fong24a.html %V 238 %X Symbolic Regression (SR) is an approach which learns a closed-form function relating the predictors to the outcome in a dataset. Datasets are often multi-level (MuL), meaning that certain features can be used to split data into groups for analysis (we refer to these features as levels). The advantage of viewing datasets as MuL is that we can exploit the high similarity of data within a group. SR is well-suited for MuL datasets, in which the learnt function structure serves as ‘shared information’ between the groups while the learnt parameter values capture the unique relationships within each group. In this context, this paper makes three contributions: (i) We design an algorithm, Multi-level Symbolic Regression (MSR), which runs multiple parallel SR processes for each group and merges them to produce a single function structure. (ii) To tackle datasets that are not explicitly MuL, we develop a metric termed MLICC to select the best feature to serve as a level. (iii) We also release MSRBench, a database of MuL datasets (synthetic and real-world) which we developed and collated, that can be used to evaluate MSR. Our results and ablation studies demonstrate that MSR achieves a higher recovery rate and lower error on MSRBench compared to SOTA methods for SR and MuL datasets.
APA
Sen Fong, K. & Motani, M.. (2024). Multi-Level Symbolic Regression: Function Structure Learning for Multi-Level Data . Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:2890-2898 Available from https://proceedings.mlr.press/v238/sen-fong24a.html.

Related Material