Hashing for Distributed Data

Cong Leng; Jiaxiang Wu; Jian Cheng; Xi Zhang; Hanqing Lu

Hashing for Distributed Data

Cong Leng, Jiaxiang Wu, Jian Cheng, Xi Zhang, Hanqing Lu

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:1642-1650, 2015.

Abstract

Recently, hashing based approximate nearest neighbors search has attracted much attention. Extensive centralized hashing algorithms have been proposed and achieved promising performance. However, due to the large scale of many applications, the data is often stored or even collected in a distributed manner. Learning hash functions by aggregating all the data into a fusion center is infeasible because of the prohibitively expensive communication and computation overhead. In this paper, we develop a novel hashing model to learn hash functions in a distributed setting. We cast a centralized hashing model as a set of subproblems with consensus constraints. We find these subproblems can be analytically solved in parallel on the distributed compute nodes. Since no training data is transmitted across the nodes in the learning process, the communication cost of our model is independent to the data size. Extensive experiments on several large scale datasets containing up to 100 million samples demonstrate the efficacy of our method.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-leng15,
  title = 	 {Hashing for Distributed Data},
  author = 	 {Leng, Cong and Wu, Jiaxiang and Cheng, Jian and Zhang, Xi and Lu, Hanqing},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {1642--1650},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/leng15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/leng15.html},
  abstract = 	 {Recently, hashing based approximate nearest neighbors search has attracted much attention. Extensive centralized hashing algorithms have been proposed and achieved promising performance. However, due to the large scale of many applications, the data is often stored or even collected in a distributed manner. Learning hash functions by aggregating all the data into a fusion center is infeasible because of the prohibitively expensive communication and computation overhead. In this paper, we develop a novel hashing model to learn hash functions in a distributed setting. We cast a centralized hashing model as a set of subproblems with consensus constraints. We find these subproblems can be analytically solved in parallel on the distributed compute nodes. Since no training data is transmitted across the nodes in the learning process, the communication cost of our model is independent to the data size. Extensive experiments on several large scale datasets containing up to 100 million samples demonstrate the efficacy of our method.}
}

Endnote

%0 Conference Paper
%T Hashing for Distributed Data
%A Cong Leng
%A Jiaxiang Wu
%A Jian Cheng
%A Xi Zhang
%A Hanqing Lu
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-leng15
%I PMLR
%P 1642--1650
%U https://proceedings.mlr.press/v37/leng15.html
%V 37
%X Recently, hashing based approximate nearest neighbors search has attracted much attention. Extensive centralized hashing algorithms have been proposed and achieved promising performance. However, due to the large scale of many applications, the data is often stored or even collected in a distributed manner. Learning hash functions by aggregating all the data into a fusion center is infeasible because of the prohibitively expensive communication and computation overhead. In this paper, we develop a novel hashing model to learn hash functions in a distributed setting. We cast a centralized hashing model as a set of subproblems with consensus constraints. We find these subproblems can be analytically solved in parallel on the distributed compute nodes. Since no training data is transmitted across the nodes in the learning process, the communication cost of our model is independent to the data size. Extensive experiments on several large scale datasets containing up to 100 million samples demonstrate the efficacy of our method.

RIS


TY  - CPAPER
TI  - Hashing for Distributed Data
AU  - Cong Leng
AU  - Jiaxiang Wu
AU  - Jian Cheng
AU  - Xi Zhang
AU  - Hanqing Lu
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-leng15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 1642
EP  - 1650
L1  - http://proceedings.mlr.press/v37/leng15.pdf
UR  - https://proceedings.mlr.press/v37/leng15.html
AB  - Recently, hashing based approximate nearest neighbors search has attracted much attention. Extensive centralized hashing algorithms have been proposed and achieved promising performance. However, due to the large scale of many applications, the data is often stored or even collected in a distributed manner. Learning hash functions by aggregating all the data into a fusion center is infeasible because of the prohibitively expensive communication and computation overhead. In this paper, we develop a novel hashing model to learn hash functions in a distributed setting. We cast a centralized hashing model as a set of subproblems with consensus constraints. We find these subproblems can be analytically solved in parallel on the distributed compute nodes. Since no training data is transmitted across the nodes in the learning process, the communication cost of our model is independent to the data size. Extensive experiments on several large scale datasets containing up to 100 million samples demonstrate the efficacy of our method.
ER  -

APA


Leng, C., Wu, J., Cheng, J., Zhang, X. & Lu, H.. (2015). Hashing for Distributed Data. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:1642-1650 Available from https://proceedings.mlr.press/v37/leng15.html.

Related Material

Download PDF