How to Find Big-Oh in Your Data Set (and How Not To)

C. C. McGeoch, P. R. Cohen
Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, PMLR R1:347-354, 1997.

Abstract

The \emph{empirical curve bounding problem} is defined as follows. Suppose data vectors $X$, $Y$ are presented such that $E(Y[i]) = \bar{f}(X[i])$ where $\bar{f}(x)$ is an unknown function. The problem is to analyze $X$, $Y$ and obtain complexity bounds $O(g_u(x))$ and $\Omega(g_l(x))$ on the function $\bar{f}(x)$. As no algorithm for empirical curve bounding can be guaranteed correct, we consider heuristics. Five heuristic algorithms are presented here, together with analytical results guaranteeing correctness for certain families of functions. Experimental evaluations of the correctness and tightness of bounds obtained by the rules for several constructed functions $f(x)$ and real datasets are described.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR1-mcgeoch97a, title = {How to Find Big-Oh in Your Data Set (and How Not To)}, author = {McGeoch, C. C. and Cohen, P. R.}, booktitle = {Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics}, pages = {347--354}, year = {1997}, editor = {Madigan, David and Smyth, Padhraic}, volume = {R1}, series = {Proceedings of Machine Learning Research}, month = {04--07 Jan}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/r1/mcgeoch97a/mcgeoch97a.pdf}, url = {https://proceedings.mlr.press/r1/mcgeoch97a.html}, abstract = {The \emph{empirical curve bounding problem} is defined as follows. Suppose data vectors $X$, $Y$ are presented such that $E(Y[i]) = \bar{f}(X[i])$ where $\bar{f}(x)$ is an unknown function. The problem is to analyze $X$, $Y$ and obtain complexity bounds $O(g_u(x))$ and $\Omega(g_l(x))$ on the function $\bar{f}(x)$. As no algorithm for empirical curve bounding can be guaranteed correct, we consider heuristics. Five heuristic algorithms are presented here, together with analytical results guaranteeing correctness for certain families of functions. Experimental evaluations of the correctness and tightness of bounds obtained by the rules for several constructed functions $f(x)$ and real datasets are described.}, note = {Reissued by PMLR on 30 March 2021.} }
Endnote
%0 Conference Paper %T How to Find Big-Oh in Your Data Set (and How Not To) %A C. C. McGeoch %A P. R. Cohen %B Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 1997 %E David Madigan %E Padhraic Smyth %F pmlr-vR1-mcgeoch97a %I PMLR %P 347--354 %U https://proceedings.mlr.press/r1/mcgeoch97a.html %V R1 %X The \emph{empirical curve bounding problem} is defined as follows. Suppose data vectors $X$, $Y$ are presented such that $E(Y[i]) = \bar{f}(X[i])$ where $\bar{f}(x)$ is an unknown function. The problem is to analyze $X$, $Y$ and obtain complexity bounds $O(g_u(x))$ and $\Omega(g_l(x))$ on the function $\bar{f}(x)$. As no algorithm for empirical curve bounding can be guaranteed correct, we consider heuristics. Five heuristic algorithms are presented here, together with analytical results guaranteeing correctness for certain families of functions. Experimental evaluations of the correctness and tightness of bounds obtained by the rules for several constructed functions $f(x)$ and real datasets are described. %Z Reissued by PMLR on 30 March 2021.
APA
McGeoch, C.C. & Cohen, P.R.. (1997). How to Find Big-Oh in Your Data Set (and How Not To). Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R1:347-354 Available from https://proceedings.mlr.press/r1/mcgeoch97a.html. Reissued by PMLR on 30 March 2021.

Related Material