[edit]
Multi-Modal Natural Intelligence through Active Predictive Coding
Proceedings of the First Workshop on NeuroAI Multimodal Intelligence @ AAAI 2026, PMLR 308:18-26, 2026.
Abstract
Active predictive coding (APC) is a recently proposed theory of the neocortex that postulates that a canonical sensory-motor processing circuit is replicated across cortical areas. These areas are organized in a rough hierarchy, with higher-level neural states modulating lower-level circuits implementing state-transition dynamics and policy functions. Such a structure enables the network to learn the compositional structure of the world, allowing it to rapidly compose solutions to new problems and generalize quickly to new environments. In APC, complex state transition dynamics are modeled as a sequence of simpler dynamics, which in turn are modeled using even simpler dynamics, and so on. Complex policies are similarly modeled as sequences of simpler policies, with the lowest level comprising sequences of primitive actions. Here we show that the APC model offers a unifying framework for multi-modal intelligence by demonstrating that the same architecture can (a) perform visual object recognition via active sensing (eye movements) and parts-based understanding, (b) navigate to desired goal locations in a complex environment through hierarchical planning, (c) learn to parse language hierarchically, infer the goal (i.e., intent) of an uttered sentence, and achieve the inferred goal through actions, and (d) scale up to realistic environments. Our results suggest that neurally-inspired approaches such as APC can help pave the way for more interpretable, generalizable, efficient, and human-like multi-modal AI.