Learning with Feature and Distribution Evolvable Streams

Zhen-Yu Zhang, Peng Zhao, Yuan Jiang, Zhi-Hua Zhou
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:11317-11327, 2020.

Abstract

In many real-world applications, data are collected in the form of a stream, whose feature space can evolve over time. For instance, in the environmental monitoring task, features can be dynamically vanished or augmented due to the existence of expired old sensors and deployed new sensors. Furthermore, besides the evolvable feature space, the data distribution is usually changing in the streaming scenario. When both feature space and data distribution are evolvable, it is quite challenging to design algorithms with guarantees, particularly theoretical understandings of generalization ability. To address this difficulty, we propose a novel discrepancy measure for data with evolving feature space and data distribution, named the \emph{evolving discrepancy}. Based on that, we present the generalization error analysis, and the theory motivates the design of a learning algorithm which is further implemented by deep neural networks. Empirical studies on synthetic data verify the rationale of our proposed discrepancy measure, and extensive experiments on real-world tasks validate the effectiveness of our algorithm.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-zhang20ad, title = {Learning with Feature and Distribution Evolvable Streams}, author = {Zhang, Zhen-Yu and Zhao, Peng and Jiang, Yuan and Zhou, Zhi-Hua}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {11317--11327}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/zhang20ad/zhang20ad.pdf}, url = {https://proceedings.mlr.press/v119/zhang20ad.html}, abstract = {In many real-world applications, data are collected in the form of a stream, whose feature space can evolve over time. For instance, in the environmental monitoring task, features can be dynamically vanished or augmented due to the existence of expired old sensors and deployed new sensors. Furthermore, besides the evolvable feature space, the data distribution is usually changing in the streaming scenario. When both feature space and data distribution are evolvable, it is quite challenging to design algorithms with guarantees, particularly theoretical understandings of generalization ability. To address this difficulty, we propose a novel discrepancy measure for data with evolving feature space and data distribution, named the \emph{evolving discrepancy}. Based on that, we present the generalization error analysis, and the theory motivates the design of a learning algorithm which is further implemented by deep neural networks. Empirical studies on synthetic data verify the rationale of our proposed discrepancy measure, and extensive experiments on real-world tasks validate the effectiveness of our algorithm.} }
Endnote
%0 Conference Paper %T Learning with Feature and Distribution Evolvable Streams %A Zhen-Yu Zhang %A Peng Zhao %A Yuan Jiang %A Zhi-Hua Zhou %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-zhang20ad %I PMLR %P 11317--11327 %U https://proceedings.mlr.press/v119/zhang20ad.html %V 119 %X In many real-world applications, data are collected in the form of a stream, whose feature space can evolve over time. For instance, in the environmental monitoring task, features can be dynamically vanished or augmented due to the existence of expired old sensors and deployed new sensors. Furthermore, besides the evolvable feature space, the data distribution is usually changing in the streaming scenario. When both feature space and data distribution are evolvable, it is quite challenging to design algorithms with guarantees, particularly theoretical understandings of generalization ability. To address this difficulty, we propose a novel discrepancy measure for data with evolving feature space and data distribution, named the \emph{evolving discrepancy}. Based on that, we present the generalization error analysis, and the theory motivates the design of a learning algorithm which is further implemented by deep neural networks. Empirical studies on synthetic data verify the rationale of our proposed discrepancy measure, and extensive experiments on real-world tasks validate the effectiveness of our algorithm.
APA
Zhang, Z., Zhao, P., Jiang, Y. & Zhou, Z.. (2020). Learning with Feature and Distribution Evolvable Streams. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:11317-11327 Available from https://proceedings.mlr.press/v119/zhang20ad.html.

Related Material