[edit]
To correct or not to correct?: Assessing the multiple comparisons problem for association rule mining of environmental DNA (eDNA) detection survey datasets
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:1012-1019, 2026.
Abstract
Unsupervised machine learning is a valuable exploratory tool for the small, noisy, and incomplete datasets characteristic of ecological and environmental research, where data limitations often render supervised approaches impractical. Here, we apply association rule mining (ARM) to a brook trout (\textit{Salvelinus fontinalis}) environmental DNA (eDNA) dataset with two objectives: (1) demonstrating ARM as a screening tool to identify environmental correlates and guide targeted metadata collection, and (2) evaluating Bonferroni and Benjamini–Hochberg corrections for managing Type I error rates during significance-based pruning. Our results show that, while both methods retained high-quality rules, the Bonferroni procedure eliminated several ecologically interesting associations that survived Benjamini–Hochberg correction. For small environmental datasets in an exploratory context, we thus favour Benjamini–Hochberg as a more appropriate correction strategy, notwithstanding the need for further external validation.