Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation

Carolina Higuera, Akash Sharma, Taosha Fan, Chaithanya Krishna Bodduluri, Byron Boots, Michael Kaess, Mike Lambeta, Tingfan Wu, Zixi Liu, Francois Robert Hogan, Mustafa Mukadam
Proceedings of The 9th Conference on Robot Learning, PMLR 305:105-123, 2025.

Abstract

We present TacX, the first multisensory touch representations across four tactile modalities: image, audio, motion, and pressure. Trained on  1M contact-rich interactions collected with the Digit 360 sensor, TacX captures complementary touch signals at diverse temporal and spatial scales. By leveraging self-supervised learning, TacX fuses these modalities into a unified representation that captures physical properties useful for downstream robot manipulation tasks. We study how to effectively integrate real-world touch representations for both imitation learning and tactile adaptation of sim-trained policies, showing that TacX boosts policy success rates by 63% over an end-to-end model using tactile images and improves robustness by 90% in recovering object states from touch. Finally, we benchmark TacX’s ability to make inference about physical properties, such as object-action identification, material-quantity estimation and force estimation. TacX improves accuracy in characterizing physical properties by 48% compared to end-to-end approaches, demonstrating the advantages of multisensory pretraining for capturing features essential for dexterous manipulation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v305-higuera25a, title = {Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation}, author = {Higuera, Carolina and Sharma, Akash and Fan, Taosha and Bodduluri, Chaithanya Krishna and Boots, Byron and Kaess, Michael and Lambeta, Mike and Wu, Tingfan and Liu, Zixi and Hogan, Francois Robert and Mukadam, Mustafa}, booktitle = {Proceedings of The 9th Conference on Robot Learning}, pages = {105--123}, year = {2025}, editor = {Lim, Joseph and Song, Shuran and Park, Hae-Won}, volume = {305}, series = {Proceedings of Machine Learning Research}, month = {27--30 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v305/main/assets/higuera25a/higuera25a.pdf}, url = {https://proceedings.mlr.press/v305/higuera25a.html}, abstract = {We present TacX, the first multisensory touch representations across four tactile modalities: image, audio, motion, and pressure. Trained on  1M contact-rich interactions collected with the Digit 360 sensor, TacX captures complementary touch signals at diverse temporal and spatial scales. By leveraging self-supervised learning, TacX fuses these modalities into a unified representation that captures physical properties useful for downstream robot manipulation tasks. We study how to effectively integrate real-world touch representations for both imitation learning and tactile adaptation of sim-trained policies, showing that TacX boosts policy success rates by 63% over an end-to-end model using tactile images and improves robustness by 90% in recovering object states from touch. Finally, we benchmark TacX’s ability to make inference about physical properties, such as object-action identification, material-quantity estimation and force estimation. TacX improves accuracy in characterizing physical properties by 48% compared to end-to-end approaches, demonstrating the advantages of multisensory pretraining for capturing features essential for dexterous manipulation.} }
Endnote
%0 Conference Paper %T Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation %A Carolina Higuera %A Akash Sharma %A Taosha Fan %A Chaithanya Krishna Bodduluri %A Byron Boots %A Michael Kaess %A Mike Lambeta %A Tingfan Wu %A Zixi Liu %A Francois Robert Hogan %A Mustafa Mukadam %B Proceedings of The 9th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Joseph Lim %E Shuran Song %E Hae-Won Park %F pmlr-v305-higuera25a %I PMLR %P 105--123 %U https://proceedings.mlr.press/v305/higuera25a.html %V 305 %X We present TacX, the first multisensory touch representations across four tactile modalities: image, audio, motion, and pressure. Trained on  1M contact-rich interactions collected with the Digit 360 sensor, TacX captures complementary touch signals at diverse temporal and spatial scales. By leveraging self-supervised learning, TacX fuses these modalities into a unified representation that captures physical properties useful for downstream robot manipulation tasks. We study how to effectively integrate real-world touch representations for both imitation learning and tactile adaptation of sim-trained policies, showing that TacX boosts policy success rates by 63% over an end-to-end model using tactile images and improves robustness by 90% in recovering object states from touch. Finally, we benchmark TacX’s ability to make inference about physical properties, such as object-action identification, material-quantity estimation and force estimation. TacX improves accuracy in characterizing physical properties by 48% compared to end-to-end approaches, demonstrating the advantages of multisensory pretraining for capturing features essential for dexterous manipulation.
APA
Higuera, C., Sharma, A., Fan, T., Bodduluri, C.K., Boots, B., Kaess, M., Lambeta, M., Wu, T., Liu, Z., Hogan, F.R. & Mukadam, M.. (2025). Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:105-123 Available from https://proceedings.mlr.press/v305/higuera25a.html.

Related Material