Beyond Size-Based Metrics: Measuring Task-Specific Complexity in Symbolic Regression

Krzysztof Kacprzyk, Mihaela van der Schaar
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:4609-4617, 2025.

Abstract

Symbolic regression (SR) is a machine learning approach aimed at discovering mathematical closed-form expressions that best fit a given dataset. Traditional complexity measures in SR, such as the number of terms or expression tree depth, often fail to capture the difficulty of specific analytical tasks a user might need to perform. In this paper, we introduce a new complexity measure designed to quantify the difficulty of conducting single-feature global perturbation analysis (SGPA)—a type of analysis commonly applied in fields like physics and risk scoring to understand the global impact of perturbing individual input features. We present a unified mathematical framework that formalizes and generalizes these established practices, providing a precise method to assess how challenging it is to apply SGPA to different closed-form equations. This approach enables the definition of novel complexity metrics and constraints directly tied to this practical analytical task. Additionally, we establish a reconstruction theorem, offering potential insights for developing future optimization techniques in SR.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-kacprzyk25a, title = {Beyond Size-Based Metrics: Measuring Task-Specific Complexity in Symbolic Regression}, author = {Kacprzyk, Krzysztof and van der Schaar, Mihaela}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {4609--4617}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/kacprzyk25a/kacprzyk25a.pdf}, url = {https://proceedings.mlr.press/v258/kacprzyk25a.html}, abstract = {Symbolic regression (SR) is a machine learning approach aimed at discovering mathematical closed-form expressions that best fit a given dataset. Traditional complexity measures in SR, such as the number of terms or expression tree depth, often fail to capture the difficulty of specific analytical tasks a user might need to perform. In this paper, we introduce a new complexity measure designed to quantify the difficulty of conducting single-feature global perturbation analysis (SGPA)—a type of analysis commonly applied in fields like physics and risk scoring to understand the global impact of perturbing individual input features. We present a unified mathematical framework that formalizes and generalizes these established practices, providing a precise method to assess how challenging it is to apply SGPA to different closed-form equations. This approach enables the definition of novel complexity metrics and constraints directly tied to this practical analytical task. Additionally, we establish a reconstruction theorem, offering potential insights for developing future optimization techniques in SR.} }
Endnote
%0 Conference Paper %T Beyond Size-Based Metrics: Measuring Task-Specific Complexity in Symbolic Regression %A Krzysztof Kacprzyk %A Mihaela van der Schaar %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-kacprzyk25a %I PMLR %P 4609--4617 %U https://proceedings.mlr.press/v258/kacprzyk25a.html %V 258 %X Symbolic regression (SR) is a machine learning approach aimed at discovering mathematical closed-form expressions that best fit a given dataset. Traditional complexity measures in SR, such as the number of terms or expression tree depth, often fail to capture the difficulty of specific analytical tasks a user might need to perform. In this paper, we introduce a new complexity measure designed to quantify the difficulty of conducting single-feature global perturbation analysis (SGPA)—a type of analysis commonly applied in fields like physics and risk scoring to understand the global impact of perturbing individual input features. We present a unified mathematical framework that formalizes and generalizes these established practices, providing a precise method to assess how challenging it is to apply SGPA to different closed-form equations. This approach enables the definition of novel complexity metrics and constraints directly tied to this practical analytical task. Additionally, we establish a reconstruction theorem, offering potential insights for developing future optimization techniques in SR.
APA
Kacprzyk, K. & van der Schaar, M.. (2025). Beyond Size-Based Metrics: Measuring Task-Specific Complexity in Symbolic Regression. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:4609-4617 Available from https://proceedings.mlr.press/v258/kacprzyk25a.html.

Related Material