[edit]
Fast Min-$ε$ Segmented Regression using Constant-Time Segment Merging
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:40312-40327, 2025.
Abstract
Segmented regression is a statistical method that approximates a function $f$ by a piecewise function $\hat{f}$ using noisy data samples. Min-$\epsilon$ approaches aim to reduce the regression function’s mean squared error (MSE) for a given number of $k$ segments. An optimal solution for min-$\epsilon$ segmented regression is found in $\mathcal{O}(n^2)$ time (Bai & Perron, 1998; Yamamoto & Perron, 2013) for $n$ samples. For large datasets, current heuristics improve time complexity to $\mathcal{O}(n\log{n})$ (Acharya et al., 2016) but can result in large errors, especially when exactly $k$ segments are used. We present a method for min-$\epsilon$ segmented regression that combines the scalability of top existing heuristic solutions with a statistical efficiency similar to the optimal solution. This is achieved by using a new method to merge an initial set of segments using precomputed matrices from samples, allowing both merging and error calculation in constant time. Our approach, using the same samples and parameter $k$, produces segments with up to 1,000 times lower MSE compared to Acharya et al. (2016) in about 100 times less runtime on data sets over $10^4$ samples.