Scalable Optimization of Neighbor Embedding for Visualization

Zhirong Yang; Jaakko Peltonen; Samuel Kaski

Scalable Optimization of Neighbor Embedding for Visualization

Zhirong Yang, Jaakko Peltonen, Samuel Kaski

Proceedings of the 30th International Conference on Machine Learning, PMLR 28(2):127-135, 2013.

Abstract

Neighbor embedding (NE) methods have found their use in data visualization but are limited in big data analysis tasks due to their O(n^2) complexity for n data samples. We demonstrate that the obvious approach of subsampling produces inferior results and propose a generic approximated optimization technique that reduces the NE optimization cost to O(n log n). The technique is based on realizing that in visualization the embedding space is necessarily very low-dimensional (2D or 3D), and hence efficient approximations developed for n-body force calculations can be applied. In gradient-based NE algorithms the gradient for an individual point decomposes into “forces” exerted by the other points. The contributions of close-by points need to be computed individually but far-away points can be approximated by their “center of mass”, rapidly computable by applying a recursive decomposition of the visualization space into quadrants. The new algorithm brings a significant speed-up for medium-size data, and brings “big data” within reach of visualization.

Cite this Paper

BibTeX


@InProceedings{pmlr-v28-yang13b,
  title = 	 {Scalable Optimization of Neighbor Embedding for Visualization},
  author = 	 {Yang, Zhirong and Peltonen, Jaakko and Kaski, Samuel},
  booktitle = 	 {Proceedings of the 30th International Conference on Machine Learning},
  pages = 	 {127--135},
  year = 	 {2013},
  editor = 	 {Dasgupta, Sanjoy and McAllester, David},
  volume = 	 {28},
  number =       {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Atlanta, Georgia, USA},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v28/yang13b.pdf},
  url = 	 {https://proceedings.mlr.press/v28/yang13b.html},
  abstract = 	 {Neighbor embedding (NE) methods have found their use in data visualization but are limited in big data analysis tasks due to their O(n^2) complexity for n data samples. We demonstrate that the obvious approach of subsampling produces inferior results and propose a generic approximated optimization technique that reduces the NE optimization cost to O(n log n). The technique is based on realizing that in visualization the embedding space is necessarily very low-dimensional (2D or 3D), and hence efficient approximations developed for n-body force calculations can be applied. In gradient-based NE algorithms the gradient for an individual point decomposes into “forces” exerted by the other points. The contributions of close-by points need to be computed individually but far-away points can be approximated by their “center of mass”, rapidly computable by applying a recursive decomposition of the visualization space into quadrants. The new algorithm brings a significant speed-up for medium-size data, and brings “big data” within reach of visualization.}
}

Endnote

%0 Conference Paper
%T Scalable Optimization of Neighbor Embedding for Visualization
%A Zhirong Yang
%A Jaakko Peltonen
%A Samuel Kaski
%B Proceedings of the 30th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Sanjoy Dasgupta
%E David McAllester	
%F pmlr-v28-yang13b
%I PMLR
%P 127--135
%U https://proceedings.mlr.press/v28/yang13b.html
%V 28
%N 2
%X Neighbor embedding (NE) methods have found their use in data visualization but are limited in big data analysis tasks due to their O(n^2) complexity for n data samples. We demonstrate that the obvious approach of subsampling produces inferior results and propose a generic approximated optimization technique that reduces the NE optimization cost to O(n log n). The technique is based on realizing that in visualization the embedding space is necessarily very low-dimensional (2D or 3D), and hence efficient approximations developed for n-body force calculations can be applied. In gradient-based NE algorithms the gradient for an individual point decomposes into “forces” exerted by the other points. The contributions of close-by points need to be computed individually but far-away points can be approximated by their “center of mass”, rapidly computable by applying a recursive decomposition of the visualization space into quadrants. The new algorithm brings a significant speed-up for medium-size data, and brings “big data” within reach of visualization.

RIS


TY  - CPAPER
TI  - Scalable Optimization of Neighbor Embedding for Visualization
AU  - Zhirong Yang
AU  - Jaakko Peltonen
AU  - Samuel Kaski
BT  - Proceedings of the 30th International Conference on Machine Learning
DA  - 2013/05/13
ED  - Sanjoy Dasgupta
ED  - David McAllester	
ID  - pmlr-v28-yang13b
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 28
IS  - 2
SP  - 127
EP  - 135
L1  - http://proceedings.mlr.press/v28/yang13b.pdf
UR  - https://proceedings.mlr.press/v28/yang13b.html
AB  - Neighbor embedding (NE) methods have found their use in data visualization but are limited in big data analysis tasks due to their O(n^2) complexity for n data samples. We demonstrate that the obvious approach of subsampling produces inferior results and propose a generic approximated optimization technique that reduces the NE optimization cost to O(n log n). The technique is based on realizing that in visualization the embedding space is necessarily very low-dimensional (2D or 3D), and hence efficient approximations developed for n-body force calculations can be applied. In gradient-based NE algorithms the gradient for an individual point decomposes into “forces” exerted by the other points. The contributions of close-by points need to be computed individually but far-away points can be approximated by their “center of mass”, rapidly computable by applying a recursive decomposition of the visualization space into quadrants. The new algorithm brings a significant speed-up for medium-size data, and brings “big data” within reach of visualization.
ER  -

APA


Yang, Z., Peltonen, J. & Kaski, S.. (2013). Scalable Optimization of Neighbor Embedding for Visualization. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(2):127-135 Available from https://proceedings.mlr.press/v28/yang13b.html.

Scalable Optimization of Neighbor Embedding for Visualization

Abstract

Cite this Paper

Related Material