The Sound of an Album Cover: A Probabilistic Approach to Multimedia
Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, PMLR R4:49-56, 2003.
We present a novel, flexible, statistical approach to modeling music, images and text jointly. The technique is based on multi-modal mixture models and efficient computation using online EM. The learned models can be used to browse multimedia databases, to query on a multimedia database using any combination of music, images and text (lyrics and other contextual information), to annotate documents with music and images, and to find documents in a database similar to input text, music and/or graphics files.