Multimodal Sentiment and Personality Perception Under Speech: A Comparison of Transformer-based Architectures

Ádám Fodor; Rachid R. Saboundji; Julio C. S. Jacques Junior; Sergio Escalera; David Gallardo-Pujol; András Lőrincz

Multimodal Sentiment and Personality Perception Under Speech: A Comparison of Transformer-based Architectures

Ádám Fodor, Rachid R. Saboundji, Julio C. S. Jacques Junior, Sergio Escalera, David Gallardo-Pujol, András Lőrincz

Understanding Social Behavior in Dyadic and Small Group Interactions, PMLR 173:218-241, 2022.

Abstract

Human-machine, human-robot interaction, and collaboration appear in diverse fields, from homecare to Cyber-Physical Systems. Technological development is fast, whereas real-time methods for social communication analysis that can measure small changes in sentiment and personality states, including visual, acoustic and language modalities are lagging, particularly when the goal is to build robust, appearance invariant, and fair methods. We study and compare methods capable of fusing modalities while satisfying real-time and invariant appearance conditions. We compare state-of-the-art transformer architectures in sentiment estimation and introduce them in the much less explored field of personality perception. We show that the architectures perform differently on automatic sentiment and personality perception, suggesting that each task may be better captured/modeled by a particular method. Our work calls attention to the attractive properties of the linear versions of the transformer architectures. In particular, we show that the best results are achieved by fusing the different architectures{’} preprocessing methods. However, they pose extreme conditions in computation power and energy consumption for real-time computations for quadratic transformers due to their memory requirements. In turn, linear transformers pave the way for quantifying small changes in sentiment estimation and personality perception for real-time social communications for machines and robots.

Cite this Paper

BibTeX


@InProceedings{pmlr-v173-fodor22a,
  title = 	 {Multimodal Sentiment and Personality Perception Under Speech: A Comparison of Transformer-based Architectures},
  author =       {Fodor, {\'A}d{\'a}m and Saboundji, Rachid R. and Jacques Junior, Julio C. S. and Escalera, Sergio and Gallardo-Pujol, David and L{\H{o}}rincz, Andr{\'a}s},
  booktitle = 	 {Understanding Social Behavior in Dyadic and Small Group Interactions},
  pages = 	 {218--241},
  year = 	 {2022},
  editor = 	 {Palmero, Cristina and Jacques Junior, Julio C. S. and Clapés, Albert and Guyon, Isabelle and Tu, Wei-Wei and Moeslund, Thomas B. and Escalera, Sergio},
  volume = 	 {173},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16 Oct},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v173/fodor22a/fodor22a.pdf},
  url = 	 {https://proceedings.mlr.press/v173/fodor22a.html},
  abstract = 	 {Human-machine, human-robot interaction, and collaboration appear in diverse fields, from homecare to Cyber-Physical Systems. Technological development is fast, whereas real-time methods for social communication analysis that can measure small changes in sentiment and personality states, including visual, acoustic and language modalities are lagging, particularly when the goal is to build robust, appearance invariant, and fair methods. We study and compare methods capable of fusing modalities while satisfying real-time and invariant appearance conditions. We compare state-of-the-art transformer architectures in sentiment estimation and introduce them in the much less explored field of personality perception. We show that the architectures perform differently on automatic sentiment and personality perception, suggesting that each task may be better captured/modeled by a particular method. Our work calls attention to the attractive properties of the linear versions of the transformer architectures. In particular, we show that the best results are achieved by fusing the different architectures{’} preprocessing methods. However, they pose extreme conditions in computation power and energy consumption for real-time computations for quadratic transformers due to their memory requirements. In turn, linear transformers pave the way for quantifying small changes in sentiment estimation and personality perception for real-time social communications for machines and robots.}
}

Endnote

%0 Conference Paper
%T Multimodal Sentiment and Personality Perception Under Speech: A Comparison of Transformer-based Architectures
%A Ádám Fodor
%A Rachid R. Saboundji
%A Julio C. S. Jacques Junior
%A Sergio Escalera
%A David Gallardo-Pujol
%A András Lőrincz
%B Understanding Social Behavior in Dyadic and Small Group Interactions
%C Proceedings of Machine Learning Research
%D 2022
%E Cristina Palmero
%E Julio C. S. Jacques Junior
%E Albert Clapés
%E Isabelle Guyon
%E Wei-Wei Tu
%E Thomas B. Moeslund
%E Sergio Escalera	
%F pmlr-v173-fodor22a
%I PMLR
%P 218--241
%U https://proceedings.mlr.press/v173/fodor22a.html
%V 173
%X Human-machine, human-robot interaction, and collaboration appear in diverse fields, from homecare to Cyber-Physical Systems. Technological development is fast, whereas real-time methods for social communication analysis that can measure small changes in sentiment and personality states, including visual, acoustic and language modalities are lagging, particularly when the goal is to build robust, appearance invariant, and fair methods. We study and compare methods capable of fusing modalities while satisfying real-time and invariant appearance conditions. We compare state-of-the-art transformer architectures in sentiment estimation and introduce them in the much less explored field of personality perception. We show that the architectures perform differently on automatic sentiment and personality perception, suggesting that each task may be better captured/modeled by a particular method. Our work calls attention to the attractive properties of the linear versions of the transformer architectures. In particular, we show that the best results are achieved by fusing the different architectures{’} preprocessing methods. However, they pose extreme conditions in computation power and energy consumption for real-time computations for quadratic transformers due to their memory requirements. In turn, linear transformers pave the way for quantifying small changes in sentiment estimation and personality perception for real-time social communications for machines and robots.

APA


Fodor, Á., Saboundji, R.R., Jacques Junior, J.C.S., Escalera, S., Gallardo-Pujol, D. & Lőrincz, A.. (2022). Multimodal Sentiment and Personality Perception Under Speech: A Comparison of Transformer-based Architectures. Understanding Social Behavior in Dyadic and Small Group Interactions, in Proceedings of Machine Learning Research 173:218-241 Available from https://proceedings.mlr.press/v173/fodor22a.html.

Related Material

Download PDF