[edit]
On Stronger Computational Separations Between Multimodal and Unimodal Machine Learning
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:23145-23158, 2024.
Abstract
Recently, multimodal machine learning has enjoyed huge empirical success (e.g. GPT-4). Motivated to develop theoretical justification for this empirical success, Lu (NeurIPS ’23, ALT ’24) introduces a theory of multimodal learning, and considers possible separations between theoretical models of multimodal and unimodal learning. In particular, Lu (ALT ’24) shows a computational separation, which is relevant to worst-case instances of the learning task. In this paper, we give a stronger average-case computational separation, where for "typical" instances of the learning task, unimodal learning is computationally hard, but multimodal learning is easy. We then question how "natural" the average-case separation is. Would it be encountered in practice? To this end, we prove that under basic conditions, any given computational separation between average-case unimodal and multimodal learning tasks implies a corresponding cryptographic key agreement protocol. We suggest to interpret this as evidence that very strong computational advantages of multimodal learning may arise infrequently in practice, since they exist only for the "pathological" case of inherently cryptographic distributions. However, this does not apply to possible (super-polynomial) statistical advantages.