Fully-Dynamic Approximate Decision Trees With Worst-Case Update Time Guarantees

Marco Bressan, Mauro Sozio
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:4517-4541, 2024.

Abstract

We study the problem of maintaining a decision tree in the fully-dynamic setting, where the dataset is updated by an adversarial sequence of insertions and deletions. We present the first algorithm with strong guarantees on both the quality of the tree and the worst-case update time (the maximum time spent between two consecutive dataset updates). For instance, we can maintain a tree where each node has Gini gain within $\beta$ of the optimum, while guaranteeing an update time $O(d \beta^{-3} \log^4 n )$, where $d$ is the number of features and $n$ the maximum size of the dataset. This is optimal up to polylogarithmic factors, as any dynamic algorithm must have update time in $\Omega(d)$. Similar guarantees hold for the variance and information gain, for classification and regression, and even for boosted trees. This shows that many popular decision trees such as ID3 or C4.5 can be efficiently be made dynamic, answering an open question of Bressan, Damay and Sozio (AAAI 2023). We also show that, under the 3SUM conjecture or the Orthogonal Vectors Hypothesis, the update time must be polynomial in $1/\beta$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-bressan24a, title = {Fully-Dynamic Approximate Decision Trees With Worst-Case Update Time Guarantees}, author = {Bressan, Marco and Sozio, Mauro}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {4517--4541}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/bressan24a/bressan24a.pdf}, url = {https://proceedings.mlr.press/v235/bressan24a.html}, abstract = {We study the problem of maintaining a decision tree in the fully-dynamic setting, where the dataset is updated by an adversarial sequence of insertions and deletions. We present the first algorithm with strong guarantees on both the quality of the tree and the worst-case update time (the maximum time spent between two consecutive dataset updates). For instance, we can maintain a tree where each node has Gini gain within $\beta$ of the optimum, while guaranteeing an update time $O(d \beta^{-3} \log^4 n )$, where $d$ is the number of features and $n$ the maximum size of the dataset. This is optimal up to polylogarithmic factors, as any dynamic algorithm must have update time in $\Omega(d)$. Similar guarantees hold for the variance and information gain, for classification and regression, and even for boosted trees. This shows that many popular decision trees such as ID3 or C4.5 can be efficiently be made dynamic, answering an open question of Bressan, Damay and Sozio (AAAI 2023). We also show that, under the 3SUM conjecture or the Orthogonal Vectors Hypothesis, the update time must be polynomial in $1/\beta$.} }
Endnote
%0 Conference Paper %T Fully-Dynamic Approximate Decision Trees With Worst-Case Update Time Guarantees %A Marco Bressan %A Mauro Sozio %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-bressan24a %I PMLR %P 4517--4541 %U https://proceedings.mlr.press/v235/bressan24a.html %V 235 %X We study the problem of maintaining a decision tree in the fully-dynamic setting, where the dataset is updated by an adversarial sequence of insertions and deletions. We present the first algorithm with strong guarantees on both the quality of the tree and the worst-case update time (the maximum time spent between two consecutive dataset updates). For instance, we can maintain a tree where each node has Gini gain within $\beta$ of the optimum, while guaranteeing an update time $O(d \beta^{-3} \log^4 n )$, where $d$ is the number of features and $n$ the maximum size of the dataset. This is optimal up to polylogarithmic factors, as any dynamic algorithm must have update time in $\Omega(d)$. Similar guarantees hold for the variance and information gain, for classification and regression, and even for boosted trees. This shows that many popular decision trees such as ID3 or C4.5 can be efficiently be made dynamic, answering an open question of Bressan, Damay and Sozio (AAAI 2023). We also show that, under the 3SUM conjecture or the Orthogonal Vectors Hypothesis, the update time must be polynomial in $1/\beta$.
APA
Bressan, M. & Sozio, M.. (2024). Fully-Dynamic Approximate Decision Trees With Worst-Case Update Time Guarantees. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:4517-4541 Available from https://proceedings.mlr.press/v235/bressan24a.html.

Related Material