Discrete Latent Perspective Learning for Segmentation and Detection

Deyi Ji, Feng Zhao, Lanyun Zhu, Wenwei Jin, Hongtao Lu, Jieping Ye
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:21719-21730, 2024.

Abstract

In this paper, we address the challenge of Perspective-Invariant Learning in machine learning and computer vision, which involves enabling a network to understand images from varying perspectives to achieve consistent semantic interpretation. While standard approaches rely on the labor-intensive collection of multi-view images or limited data augmentation techniques, we propose a novel framework, Discrete Latent Perspective Learning (DLPL), for latent multi-perspective fusion learning using conventional single-view images. DLPL comprises three main modules: Perspective Discrete Decomposition (PDD), Perspective Homography Transformation (PHT), and Perspective Invariant Attention (PIA), which work together to discretize visual features, transform perspectives, and fuse multi-perspective semantic information, respectively. DLPL is a universal perspective learning framework applicable to a variety of scenarios and vision tasks. Extensive experiments demonstrate that DLPL significantly enhances the network’s capacity to depict images across diverse scenarios (daily photos, UAV, auto-driving) and tasks (detection, segmentation).

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-ji24e, title = {Discrete Latent Perspective Learning for Segmentation and Detection}, author = {Ji, Deyi and Zhao, Feng and Zhu, Lanyun and Jin, Wenwei and Lu, Hongtao and Ye, Jieping}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {21719--21730}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/ji24e/ji24e.pdf}, url = {https://proceedings.mlr.press/v235/ji24e.html}, abstract = {In this paper, we address the challenge of Perspective-Invariant Learning in machine learning and computer vision, which involves enabling a network to understand images from varying perspectives to achieve consistent semantic interpretation. While standard approaches rely on the labor-intensive collection of multi-view images or limited data augmentation techniques, we propose a novel framework, Discrete Latent Perspective Learning (DLPL), for latent multi-perspective fusion learning using conventional single-view images. DLPL comprises three main modules: Perspective Discrete Decomposition (PDD), Perspective Homography Transformation (PHT), and Perspective Invariant Attention (PIA), which work together to discretize visual features, transform perspectives, and fuse multi-perspective semantic information, respectively. DLPL is a universal perspective learning framework applicable to a variety of scenarios and vision tasks. Extensive experiments demonstrate that DLPL significantly enhances the network’s capacity to depict images across diverse scenarios (daily photos, UAV, auto-driving) and tasks (detection, segmentation).} }
Endnote
%0 Conference Paper %T Discrete Latent Perspective Learning for Segmentation and Detection %A Deyi Ji %A Feng Zhao %A Lanyun Zhu %A Wenwei Jin %A Hongtao Lu %A Jieping Ye %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-ji24e %I PMLR %P 21719--21730 %U https://proceedings.mlr.press/v235/ji24e.html %V 235 %X In this paper, we address the challenge of Perspective-Invariant Learning in machine learning and computer vision, which involves enabling a network to understand images from varying perspectives to achieve consistent semantic interpretation. While standard approaches rely on the labor-intensive collection of multi-view images or limited data augmentation techniques, we propose a novel framework, Discrete Latent Perspective Learning (DLPL), for latent multi-perspective fusion learning using conventional single-view images. DLPL comprises three main modules: Perspective Discrete Decomposition (PDD), Perspective Homography Transformation (PHT), and Perspective Invariant Attention (PIA), which work together to discretize visual features, transform perspectives, and fuse multi-perspective semantic information, respectively. DLPL is a universal perspective learning framework applicable to a variety of scenarios and vision tasks. Extensive experiments demonstrate that DLPL significantly enhances the network’s capacity to depict images across diverse scenarios (daily photos, UAV, auto-driving) and tasks (detection, segmentation).
APA
Ji, D., Zhao, F., Zhu, L., Jin, W., Lu, H. & Ye, J.. (2024). Discrete Latent Perspective Learning for Segmentation and Detection. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:21719-21730 Available from https://proceedings.mlr.press/v235/ji24e.html.

Related Material