To correct or not to correct?: Assessing the multiple comparisons problem for association rule mining of environmental DNA (eDNA) detection survey datasets

Nikolett Toth, Luiza Antonie, Jarrett Phillips
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:1012-1019, 2026.

Abstract

Unsupervised machine learning is a valuable exploratory tool for the small, noisy, and incomplete datasets characteristic of ecological and environmental research, where data limitations often render supervised approaches impractical. Here, we apply association rule mining (ARM) to a brook trout (\textit{Salvelinus fontinalis}) environmental DNA (eDNA) dataset with two objectives: (1) demonstrating ARM as a screening tool to identify environmental correlates and guide targeted metadata collection, and (2) evaluating Bonferroni and Benjamini–Hochberg corrections for managing Type I error rates during significance-based pruning. Our results show that, while both methods retained high-quality rules, the Bonferroni procedure eliminated several ecologically interesting associations that survived Benjamini–Hochberg correction. For small environmental datasets in an exploratory context, we thus favour Benjamini–Hochberg as a more appropriate correction strategy, notwithstanding the need for further external validation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v318-toth26a, title = {To correct or not to correct?: Assessing the multiple comparisons problem for association rule mining of environmental DNA (eDNA) detection survey datasets}, author = {Toth, Nikolett and Antonie, Luiza and Phillips, Jarrett}, booktitle = {Proceedings of the The 39th Canadian Conference on Artificial Intelligence}, pages = {1012--1019}, year = {2026}, editor = {Bouzar-Benlabiod, Lydia and Leung, Carson}, volume = {318}, series = {Proceedings of Machine Learning Research}, month = {25--29 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v318/main/assets/toth26a/toth26a.pdf}, url = {https://proceedings.mlr.press/v318/toth26a.html}, abstract = {Unsupervised machine learning is a valuable exploratory tool for the small, noisy, and incomplete datasets characteristic of ecological and environmental research, where data limitations often render supervised approaches impractical. Here, we apply association rule mining (ARM) to a brook trout (\textit{Salvelinus fontinalis}) environmental DNA (eDNA) dataset with two objectives: (1) demonstrating ARM as a screening tool to identify environmental correlates and guide targeted metadata collection, and (2) evaluating Bonferroni and Benjamini–Hochberg corrections for managing Type I error rates during significance-based pruning. Our results show that, while both methods retained high-quality rules, the Bonferroni procedure eliminated several ecologically interesting associations that survived Benjamini–Hochberg correction. For small environmental datasets in an exploratory context, we thus favour Benjamini–Hochberg as a more appropriate correction strategy, notwithstanding the need for further external validation.} }
Endnote
%0 Conference Paper %T To correct or not to correct?: Assessing the multiple comparisons problem for association rule mining of environmental DNA (eDNA) detection survey datasets %A Nikolett Toth %A Luiza Antonie %A Jarrett Phillips %B Proceedings of the The 39th Canadian Conference on Artificial Intelligence %C Proceedings of Machine Learning Research %D 2026 %E Lydia Bouzar-Benlabiod %E Carson Leung %F pmlr-v318-toth26a %I PMLR %P 1012--1019 %U https://proceedings.mlr.press/v318/toth26a.html %V 318 %X Unsupervised machine learning is a valuable exploratory tool for the small, noisy, and incomplete datasets characteristic of ecological and environmental research, where data limitations often render supervised approaches impractical. Here, we apply association rule mining (ARM) to a brook trout (\textit{Salvelinus fontinalis}) environmental DNA (eDNA) dataset with two objectives: (1) demonstrating ARM as a screening tool to identify environmental correlates and guide targeted metadata collection, and (2) evaluating Bonferroni and Benjamini–Hochberg corrections for managing Type I error rates during significance-based pruning. Our results show that, while both methods retained high-quality rules, the Bonferroni procedure eliminated several ecologically interesting associations that survived Benjamini–Hochberg correction. For small environmental datasets in an exploratory context, we thus favour Benjamini–Hochberg as a more appropriate correction strategy, notwithstanding the need for further external validation.
APA
Toth, N., Antonie, L. & Phillips, J.. (2026). To correct or not to correct?: Assessing the multiple comparisons problem for association rule mining of environmental DNA (eDNA) detection survey datasets. Proceedings of the The 39th Canadian Conference on Artificial Intelligence, in Proceedings of Machine Learning Research 318:1012-1019 Available from https://proceedings.mlr.press/v318/toth26a.html.

Related Material