Optimizing Data Usage via Differentiable Rewards

Xinyi Wang, Hieu Pham, Paul Michel, Antonios Anastasopoulos, Jaime Carbonell, Graham Neubig
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:9983-9995, 2020.

Abstract

To acquire a new skill, humans learn better and faster if a tutor, based on their current knowledge level, informs them of how much attention they should pay to particular content or practice problems. Similarly, a machine learning model could potentially be trained better with a scorer that “adapts” to its current learning state and estimates the importance of each training data instance. Training such an adaptive scorer efficiently is a challenging problem; in order to precisely quantify the effect of a data instance at a given time during the training, it is typically necessary to first complete the entire training process. To efficiently optimize data usage, we propose a reinforcement learning approach called Differentiable Data Selection (DDS). In DDS, we formulate a scorer network as a learnable function of the training data, which can be efficiently updated along with the main model being trained. Specifically, DDS updates the scorer with an intuitive reward signal: it should up-weigh the data that has a similar gradient with a dev set upon which we would finally like to perform well. Without significant computing overhead, DDS delivers strong and consistent improvements over several strong baselines on two very different tasks of machine translation and image classification.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-wang20p, title = {Optimizing Data Usage via Differentiable Rewards}, author = {Wang, Xinyi and Pham, Hieu and Michel, Paul and Anastasopoulos, Antonios and Carbonell, Jaime and Neubig, Graham}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {9983--9995}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/wang20p/wang20p.pdf}, url = {https://proceedings.mlr.press/v119/wang20p.html}, abstract = {To acquire a new skill, humans learn better and faster if a tutor, based on their current knowledge level, informs them of how much attention they should pay to particular content or practice problems. Similarly, a machine learning model could potentially be trained better with a scorer that “adapts” to its current learning state and estimates the importance of each training data instance. Training such an adaptive scorer efficiently is a challenging problem; in order to precisely quantify the effect of a data instance at a given time during the training, it is typically necessary to first complete the entire training process. To efficiently optimize data usage, we propose a reinforcement learning approach called Differentiable Data Selection (DDS). In DDS, we formulate a scorer network as a learnable function of the training data, which can be efficiently updated along with the main model being trained. Specifically, DDS updates the scorer with an intuitive reward signal: it should up-weigh the data that has a similar gradient with a dev set upon which we would finally like to perform well. Without significant computing overhead, DDS delivers strong and consistent improvements over several strong baselines on two very different tasks of machine translation and image classification.} }
Endnote
%0 Conference Paper %T Optimizing Data Usage via Differentiable Rewards %A Xinyi Wang %A Hieu Pham %A Paul Michel %A Antonios Anastasopoulos %A Jaime Carbonell %A Graham Neubig %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-wang20p %I PMLR %P 9983--9995 %U https://proceedings.mlr.press/v119/wang20p.html %V 119 %X To acquire a new skill, humans learn better and faster if a tutor, based on their current knowledge level, informs them of how much attention they should pay to particular content or practice problems. Similarly, a machine learning model could potentially be trained better with a scorer that “adapts” to its current learning state and estimates the importance of each training data instance. Training such an adaptive scorer efficiently is a challenging problem; in order to precisely quantify the effect of a data instance at a given time during the training, it is typically necessary to first complete the entire training process. To efficiently optimize data usage, we propose a reinforcement learning approach called Differentiable Data Selection (DDS). In DDS, we formulate a scorer network as a learnable function of the training data, which can be efficiently updated along with the main model being trained. Specifically, DDS updates the scorer with an intuitive reward signal: it should up-weigh the data that has a similar gradient with a dev set upon which we would finally like to perform well. Without significant computing overhead, DDS delivers strong and consistent improvements over several strong baselines on two very different tasks of machine translation and image classification.
APA
Wang, X., Pham, H., Michel, P., Anastasopoulos, A., Carbonell, J. & Neubig, G.. (2020). Optimizing Data Usage via Differentiable Rewards. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:9983-9995 Available from https://proceedings.mlr.press/v119/wang20p.html.

Related Material