On the Interplay between Information Loss and Operation Loss in Representations for Classification
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:4853-4871, 2022.
Information-theoretic measures have been widely adopted in the design of features for learning and decision problems. Inspired by this, we look at the relationship between i) a weak form of information loss in the Shannon sense and ii) operational loss in the minimum probability of error (MPE) sense when considering a family of lossy continuous representations of an observation. Our first result offers a lower bound on a weak form of information loss as a function of its respective operation loss when adopting a discrete lossy representation (quantization) instead of the original raw observation. From this, our main result shows that a specific form of vanishing information loss (a weak notion of asymptotic informational sufficiency) implies a vanishing MPE loss (or asymptotic operational sufficiency) when considering a family of lossy continuous representations. Our theoretical findings support the observation that the selection of feature representations that attempt to capture informational sufficiency is appropriate for learning, but this design principle is a rather conservative if the intended goal is achieving MPE in classification. On this last point, we discuss about studying weak forms of informational sufficiencies to achieve operational sufficiency in learning settings.