[edit]
Multi-Level Symbolic Regression: Function Structure Learning for Multi-Level Data
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:2890-2898, 2024.
Abstract
Symbolic Regression (SR) is an approach which learns a closed-form function relating the predictors to the outcome in a dataset. Datasets are often multi-level (MuL), meaning that certain features can be used to split data into groups for analysis (we refer to these features as levels). The advantage of viewing datasets as MuL is that we can exploit the high similarity of data within a group. SR is well-suited for MuL datasets, in which the learnt function structure serves as ‘shared information’ between the groups while the learnt parameter values capture the unique relationships within each group. In this context, this paper makes three contributions: (i) We design an algorithm, Multi-level Symbolic Regression (MSR), which runs multiple parallel SR processes for each group and merges them to produce a single function structure. (ii) To tackle datasets that are not explicitly MuL, we develop a metric termed MLICC to select the best feature to serve as a level. (iii) We also release MSRBench, a database of MuL datasets (synthetic and real-world) which we developed and collated, that can be used to evaluate MSR. Our results and ablation studies demonstrate that MSR achieves a higher recovery rate and lower error on MSRBench compared to SOTA methods for SR and MuL datasets.