[edit]
How to Find Big-Oh in Your Data Set (and How Not To)
Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, PMLR R1:347-354, 1997.
Abstract
The \emph{empirical curve bounding problem} is defined as follows. Suppose data vectors $X$, $Y$ are presented such that $E(Y[i]) = \bar{f}(X[i])$ where $\bar{f}(x)$ is an unknown function. The problem is to analyze $X$, $Y$ and obtain complexity bounds $O(g_u(x))$ and $\Omega(g_l(x))$ on the function $\bar{f}(x)$. As no algorithm for empirical curve bounding can be guaranteed correct, we consider heuristics. Five heuristic algorithms are presented here, together with analytical results guaranteeing correctness for certain families of functions. Experimental evaluations of the correctness and tightness of bounds obtained by the rules for several constructed functions $f(x)$ and real datasets are described.