Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning

Kyle Hsu, Jubayer Ibn Hamid, Kaylee Burns, Chelsea Finn, Jiajun Wu
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:19101-19122, 2024.

Abstract

Inductive biases are crucial in disentangled representation learning for narrowing down an underspecified solution set. In this work, we consider endowing a neural network autoencoder with three select inductive biases from the literature: data compression into a grid-like latent space via quantization, collective independence amongst latents, and minimal functional influence of any latent on how other latents determine data generation. In principle, these inductive biases are deeply complementary: they most directly specify properties of the latent space, encoder, and decoder, respectively. In practice, however, naively combining existing techniques instantiating these inductive biases fails to yield significant benefits. To address this, we propose adaptations to the three techniques that simplify the learning problem, equip key regularization terms with stabilizing invariances, and quash degenerate incentives. The resulting model, Tripod, achieves state-of-the-art results on a suite of four image disentanglement benchmarks. We also verify that Tripod significantly improves upon its naive incarnation and that all three of its "legs" are necessary for best performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-hsu24a, title = {Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning}, author = {Hsu, Kyle and Hamid, Jubayer Ibn and Burns, Kaylee and Finn, Chelsea and Wu, Jiajun}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {19101--19122}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/hsu24a/hsu24a.pdf}, url = {https://proceedings.mlr.press/v235/hsu24a.html}, abstract = {Inductive biases are crucial in disentangled representation learning for narrowing down an underspecified solution set. In this work, we consider endowing a neural network autoencoder with three select inductive biases from the literature: data compression into a grid-like latent space via quantization, collective independence amongst latents, and minimal functional influence of any latent on how other latents determine data generation. In principle, these inductive biases are deeply complementary: they most directly specify properties of the latent space, encoder, and decoder, respectively. In practice, however, naively combining existing techniques instantiating these inductive biases fails to yield significant benefits. To address this, we propose adaptations to the three techniques that simplify the learning problem, equip key regularization terms with stabilizing invariances, and quash degenerate incentives. The resulting model, Tripod, achieves state-of-the-art results on a suite of four image disentanglement benchmarks. We also verify that Tripod significantly improves upon its naive incarnation and that all three of its "legs" are necessary for best performance.} }
Endnote
%0 Conference Paper %T Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning %A Kyle Hsu %A Jubayer Ibn Hamid %A Kaylee Burns %A Chelsea Finn %A Jiajun Wu %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-hsu24a %I PMLR %P 19101--19122 %U https://proceedings.mlr.press/v235/hsu24a.html %V 235 %X Inductive biases are crucial in disentangled representation learning for narrowing down an underspecified solution set. In this work, we consider endowing a neural network autoencoder with three select inductive biases from the literature: data compression into a grid-like latent space via quantization, collective independence amongst latents, and minimal functional influence of any latent on how other latents determine data generation. In principle, these inductive biases are deeply complementary: they most directly specify properties of the latent space, encoder, and decoder, respectively. In practice, however, naively combining existing techniques instantiating these inductive biases fails to yield significant benefits. To address this, we propose adaptations to the three techniques that simplify the learning problem, equip key regularization terms with stabilizing invariances, and quash degenerate incentives. The resulting model, Tripod, achieves state-of-the-art results on a suite of four image disentanglement benchmarks. We also verify that Tripod significantly improves upon its naive incarnation and that all three of its "legs" are necessary for best performance.
APA
Hsu, K., Hamid, J.I., Burns, K., Finn, C. & Wu, J.. (2024). Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:19101-19122 Available from https://proceedings.mlr.press/v235/hsu24a.html.

Related Material