Challenges of Decomposing Tools in Surgical Scenes Through Disentangling The Latent Representations

Sai Lokesh Gorantla, Raviteja Sista, Apoorva Srivastava, Utpal De, Partha Pratim Chakrabarti, Debdoot Sheet
Proceedings on "I Can't Believe It's Not Better: Challenges in Applied Deep Learning" at ICLR 2025 Workshops, PMLR 296:130-140, 2025.

Abstract

Image generation through disentangling object representations is a critical area of research with significant potential. Disentanglement involves separating the representation of objects and their attributes, enabling greater control over the generated output. However, existing approaches are limited to disentangling only the objects’ attributes and generating images with selected combinations of attributes. This study explores learning object-level disentanglement of semantically rich latent representation using von-Mises-Fisher (vMF) distributions. The proposed approach aims to disentangle compressed representations into object and background classes. The approach is tested on surgical scenes for disentanglement of tools and background information using the Cholec80 dataset. Achieving tool-background disentanglement provides an opportunity to generate rare and custom surgical scenes. However, the proposed method learns to disentangle representations based on pixel intensities. This study uncovers the challenges and shortfalls in achieving object-level disentanglement of the compressed representations using vMF distributions. The code for this study is available at https://github.com/it-is-lokesh/vMF-disentanglement-challenges.

Cite this Paper


BibTeX
@InProceedings{pmlr-v296-gorantla25a, title = {Challenges of Decomposing Tools in Surgical Scenes Through Disentangling The Latent Representations}, author = {Gorantla, Sai Lokesh and Sista, Raviteja and Srivastava, Apoorva and De, Utpal and Chakrabarti, Partha Pratim and Sheet, Debdoot}, booktitle = {Proceedings on "I Can't Believe It's Not Better: Challenges in Applied Deep Learning" at ICLR 2025 Workshops}, pages = {130--140}, year = {2025}, editor = {Blaas, Arno and D’Costa, Priya and Feng, Fan and Kriegler, Andreas and Mason, Ian and Pan, Zhaoying and Uelwer, Tobias and Williams, Jennifer and Xie, Yubin and Yang, Rui}, volume = {296}, series = {Proceedings of Machine Learning Research}, month = {28 Apr}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v296/main/assets/gorantla25a/gorantla25a.pdf}, url = {https://proceedings.mlr.press/v296/gorantla25a.html}, abstract = {Image generation through disentangling object representations is a critical area of research with significant potential. Disentanglement involves separating the representation of objects and their attributes, enabling greater control over the generated output. However, existing approaches are limited to disentangling only the objects’ attributes and generating images with selected combinations of attributes. This study explores learning object-level disentanglement of semantically rich latent representation using von-Mises-Fisher (vMF) distributions. The proposed approach aims to disentangle compressed representations into object and background classes. The approach is tested on surgical scenes for disentanglement of tools and background information using the Cholec80 dataset. Achieving tool-background disentanglement provides an opportunity to generate rare and custom surgical scenes. However, the proposed method learns to disentangle representations based on pixel intensities. This study uncovers the challenges and shortfalls in achieving object-level disentanglement of the compressed representations using vMF distributions. The code for this study is available at https://github.com/it-is-lokesh/vMF-disentanglement-challenges.} }
Endnote
%0 Conference Paper %T Challenges of Decomposing Tools in Surgical Scenes Through Disentangling The Latent Representations %A Sai Lokesh Gorantla %A Raviteja Sista %A Apoorva Srivastava %A Utpal De %A Partha Pratim Chakrabarti %A Debdoot Sheet %B Proceedings on "I Can't Believe It's Not Better: Challenges in Applied Deep Learning" at ICLR 2025 Workshops %C Proceedings of Machine Learning Research %D 2025 %E Arno Blaas %E Priya D’Costa %E Fan Feng %E Andreas Kriegler %E Ian Mason %E Zhaoying Pan %E Tobias Uelwer %E Jennifer Williams %E Yubin Xie %E Rui Yang %F pmlr-v296-gorantla25a %I PMLR %P 130--140 %U https://proceedings.mlr.press/v296/gorantla25a.html %V 296 %X Image generation through disentangling object representations is a critical area of research with significant potential. Disentanglement involves separating the representation of objects and their attributes, enabling greater control over the generated output. However, existing approaches are limited to disentangling only the objects’ attributes and generating images with selected combinations of attributes. This study explores learning object-level disentanglement of semantically rich latent representation using von-Mises-Fisher (vMF) distributions. The proposed approach aims to disentangle compressed representations into object and background classes. The approach is tested on surgical scenes for disentanglement of tools and background information using the Cholec80 dataset. Achieving tool-background disentanglement provides an opportunity to generate rare and custom surgical scenes. However, the proposed method learns to disentangle representations based on pixel intensities. This study uncovers the challenges and shortfalls in achieving object-level disentanglement of the compressed representations using vMF distributions. The code for this study is available at https://github.com/it-is-lokesh/vMF-disentanglement-challenges.
APA
Gorantla, S.L., Sista, R., Srivastava, A., De, U., Chakrabarti, P.P. & Sheet, D.. (2025). Challenges of Decomposing Tools in Surgical Scenes Through Disentangling The Latent Representations. Proceedings on "I Can't Believe It's Not Better: Challenges in Applied Deep Learning" at ICLR 2025 Workshops, in Proceedings of Machine Learning Research 296:130-140 Available from https://proceedings.mlr.press/v296/gorantla25a.html.

Related Material