I Mean I Am a Mouse: meets for Bilingual Multimodal Meme Sarcasm Classification from Large Language Models

yunzhe Liu, xinyi Xu
Proceedings of the 16th Asian Conference on Machine Learning, PMLR 260:1096-1111, 2025.

Abstract

Multimodal image-text memes are widely used on social networks and present significant challenges for high-precision sentiment analysis, social network analysis, and understanding diverse user communities, especially due to their deep cultural and regional influences. However, most existing studies on multimodal memes focus primarily on Englishspeaking communities and on preliminary tasks, such as harmful meme detection. In this paper, we focus on a more specific challenge: high-precision sarcasm classification in various contexts. We introduce a novel dataset for classifying sarcasm in multimodal memes, covering both Chinese and English languages. This dataset serves as a critical resource for developing and evaluating models that detect sarcasm across different cultural contexts. Furthermore, we propose a framework named Mmeets, which leverages Large Language Models (LLMs) and abductive reasoning to interpret the relationships between images and text, enhancing text understanding. Mmeets employs a pre-trained AltCLIP vision-language model alongside a cross-attention mechanism to effectively fuse image and text data, capturing subtle semantic connections. Our experimental results show that the Mmeets method outperforms state-of-the-art techniques in sarcasm classification tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v260-liu25b, title = {{I Mean I Am a Mouse}: {m}eets for Bilingual Multimodal Meme Sarcasm Classification from Large Language Models}, author = {Liu, yunzhe and Xu, xinyi}, booktitle = {Proceedings of the 16th Asian Conference on Machine Learning}, pages = {1096--1111}, year = {2025}, editor = {Nguyen, Vu and Lin, Hsuan-Tien}, volume = {260}, series = {Proceedings of Machine Learning Research}, month = {05--08 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v260/main/assets/liu25b/liu25b.pdf}, url = {https://proceedings.mlr.press/v260/liu25b.html}, abstract = {Multimodal image-text memes are widely used on social networks and present significant challenges for high-precision sentiment analysis, social network analysis, and understanding diverse user communities, especially due to their deep cultural and regional influences. However, most existing studies on multimodal memes focus primarily on Englishspeaking communities and on preliminary tasks, such as harmful meme detection. In this paper, we focus on a more specific challenge: high-precision sarcasm classification in various contexts. We introduce a novel dataset for classifying sarcasm in multimodal memes, covering both Chinese and English languages. This dataset serves as a critical resource for developing and evaluating models that detect sarcasm across different cultural contexts. Furthermore, we propose a framework named Mmeets, which leverages Large Language Models (LLMs) and abductive reasoning to interpret the relationships between images and text, enhancing text understanding. Mmeets employs a pre-trained AltCLIP vision-language model alongside a cross-attention mechanism to effectively fuse image and text data, capturing subtle semantic connections. Our experimental results show that the Mmeets method outperforms state-of-the-art techniques in sarcasm classification tasks.} }
Endnote
%0 Conference Paper %T I Mean I Am a Mouse: meets for Bilingual Multimodal Meme Sarcasm Classification from Large Language Models %A yunzhe Liu %A xinyi Xu %B Proceedings of the 16th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Vu Nguyen %E Hsuan-Tien Lin %F pmlr-v260-liu25b %I PMLR %P 1096--1111 %U https://proceedings.mlr.press/v260/liu25b.html %V 260 %X Multimodal image-text memes are widely used on social networks and present significant challenges for high-precision sentiment analysis, social network analysis, and understanding diverse user communities, especially due to their deep cultural and regional influences. However, most existing studies on multimodal memes focus primarily on Englishspeaking communities and on preliminary tasks, such as harmful meme detection. In this paper, we focus on a more specific challenge: high-precision sarcasm classification in various contexts. We introduce a novel dataset for classifying sarcasm in multimodal memes, covering both Chinese and English languages. This dataset serves as a critical resource for developing and evaluating models that detect sarcasm across different cultural contexts. Furthermore, we propose a framework named Mmeets, which leverages Large Language Models (LLMs) and abductive reasoning to interpret the relationships between images and text, enhancing text understanding. Mmeets employs a pre-trained AltCLIP vision-language model alongside a cross-attention mechanism to effectively fuse image and text data, capturing subtle semantic connections. Our experimental results show that the Mmeets method outperforms state-of-the-art techniques in sarcasm classification tasks.
APA
Liu, y. & Xu, x.. (2025). I Mean I Am a Mouse: meets for Bilingual Multimodal Meme Sarcasm Classification from Large Language Models. Proceedings of the 16th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 260:1096-1111 Available from https://proceedings.mlr.press/v260/liu25b.html.

Related Material