The Boombox: Visual Reconstruction from Acoustic Vibrations

Boyuan Chen; Mia Chiquier; Hod Lipson; Carl Vondrick

The Boombox: Visual Reconstruction from Acoustic Vibrations

Boyuan Chen, Mia Chiquier, Hod Lipson, Carl Vondrick

Proceedings of the 5th Conference on Robot Learning, PMLR 164:1067-1077, 2022.

Abstract

Interacting with bins and containers is a fundamental task in robotics, making state estimation of the objects inside the bin critical. While robots often use cameras for state estimation, the visual modality is not always ideal due to occlusions and poor illumination. We introduce The Boombox, a container that uses sound to estimate the state of the contents inside a box. Based on the observation that the collision between objects and its containers will cause an acoustic vibration, we present a convolutional network for learning to reconstruct visual scenes. Although we use low-cost and low-power contact microphones to detect the vibrations, our results show that learning from multimodal data enables state estimation from affordable audio sensors. Due to the many ways that robots use containers, we believe the box will have a number of applications in robotics.

Cite this Paper

BibTeX


@InProceedings{pmlr-v164-chen22c,
  title = 	 {The Boombox: Visual Reconstruction from Acoustic Vibrations},
  author =       {Chen, Boyuan and Chiquier, Mia and Lipson, Hod and Vondrick, Carl},
  booktitle = 	 {Proceedings of the 5th Conference on Robot Learning},
  pages = 	 {1067--1077},
  year = 	 {2022},
  editor = 	 {Faust, Aleksandra and Hsu, David and Neumann, Gerhard},
  volume = 	 {164},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {08--11 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v164/chen22c/chen22c.pdf},
  url = 	 {https://proceedings.mlr.press/v164/chen22c.html},
  abstract = 	 {Interacting with bins and containers is a fundamental task in robotics, making state estimation of the objects inside the bin critical.  While robots often use cameras for state estimation, the visual modality is not always ideal due to occlusions and poor illumination. We introduce The Boombox, a container that uses sound to estimate the state of the contents inside a box. Based on the observation that the collision between objects and its containers will cause an acoustic vibration, we present a convolutional network for learning to reconstruct visual scenes. Although we use low-cost and low-power contact microphones to detect the vibrations, our results show that learning from multimodal data enables state estimation from affordable audio sensors. Due to the many ways that robots use containers, we believe the box will have a number of applications in robotics.}
}

Endnote

%0 Conference Paper
%T The Boombox: Visual Reconstruction from Acoustic Vibrations
%A Boyuan Chen
%A Mia Chiquier
%A Hod Lipson
%A Carl Vondrick
%B Proceedings of the 5th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Aleksandra Faust
%E David Hsu
%E Gerhard Neumann	
%F pmlr-v164-chen22c
%I PMLR
%P 1067--1077
%U https://proceedings.mlr.press/v164/chen22c.html
%V 164
%X Interacting with bins and containers is a fundamental task in robotics, making state estimation of the objects inside the bin critical.  While robots often use cameras for state estimation, the visual modality is not always ideal due to occlusions and poor illumination. We introduce The Boombox, a container that uses sound to estimate the state of the contents inside a box. Based on the observation that the collision between objects and its containers will cause an acoustic vibration, we present a convolutional network for learning to reconstruct visual scenes. Although we use low-cost and low-power contact microphones to detect the vibrations, our results show that learning from multimodal data enables state estimation from affordable audio sensors. Due to the many ways that robots use containers, we believe the box will have a number of applications in robotics.

APA


Chen, B., Chiquier, M., Lipson, H. & Vondrick, C.. (2022). The Boombox: Visual Reconstruction from Acoustic Vibrations. Proceedings of the 5th Conference on Robot Learning, in Proceedings of Machine Learning Research 164:1067-1077 Available from https://proceedings.mlr.press/v164/chen22c.html.

The Boombox: Visual Reconstruction from Acoustic Vibrations

Abstract

Cite this Paper

Related Material