Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep Reinforcement Learning

Paulo R d O Costa, Jason Rhuggenaath, Yingqian Zhang, Alp Akcay
; Proceedings of The 12th Asian Conference on Machine Learning, PMLR 129:465-480, 2020.

Abstract

Recent works using deep learning to solve the Traveling Salesman Problem (TSP) have focused on learning construction heuristics. Such approaches find TSP solutions of good quality but require additional procedures such as beam search and sampling to improve solutions and achieve state-of-the-art performance. However, few studies have focused on improvement heuristics, where a given solution is improved until reaching a near-optimal one. In this work, we propose to learn a local search heuristic based on 2-opt operators via deep reinforcement learning. We propose a policy gradient algorithm to learn a stochastic policy that selects 2-opt operations given a current solution. Moreover, we introduce a policy neural network that leverages a pointing attention mechanism, which unlike previous works, can be easily extended to more general $k$-opt moves. Our results show that the learned policies can improve even over random initial solutions and approach near-optimal solutions at a faster rate than previous state-of-the-art deep learning methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v129-costa20a, title = {Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep Reinforcement Learning}, author = {da Costa, Paulo R d O and Rhuggenaath, Jason and Zhang, Yingqian and Akcay, Alp}, booktitle = {Proceedings of The 12th Asian Conference on Machine Learning}, pages = {465--480}, year = {2020}, editor = {Sinno Jialin Pan and Masashi Sugiyama}, volume = {129}, series = {Proceedings of Machine Learning Research}, address = {Bangkok, Thailand}, month = {18--20 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v129/costa20a/costa20a.pdf}, url = {http://proceedings.mlr.press/v129/costa20a.html}, abstract = {Recent works using deep learning to solve the Traveling Salesman Problem (TSP) have focused on learning construction heuristics. Such approaches find TSP solutions of good quality but require additional procedures such as beam search and sampling to improve solutions and achieve state-of-the-art performance. However, few studies have focused on improvement heuristics, where a given solution is improved until reaching a near-optimal one. In this work, we propose to learn a local search heuristic based on 2-opt operators via deep reinforcement learning. We propose a policy gradient algorithm to learn a stochastic policy that selects 2-opt operations given a current solution. Moreover, we introduce a policy neural network that leverages a pointing attention mechanism, which unlike previous works, can be easily extended to more general $k$-opt moves. Our results show that the learned policies can improve even over random initial solutions and approach near-optimal solutions at a faster rate than previous state-of-the-art deep learning methods.} }
Endnote
%0 Conference Paper %T Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep Reinforcement Learning %A Paulo R d O Costa %A Jason Rhuggenaath %A Yingqian Zhang %A Alp Akcay %B Proceedings of The 12th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Sinno Jialin Pan %E Masashi Sugiyama %F pmlr-v129-costa20a %I PMLR %J Proceedings of Machine Learning Research %P 465--480 %U http://proceedings.mlr.press %V 129 %W PMLR %X Recent works using deep learning to solve the Traveling Salesman Problem (TSP) have focused on learning construction heuristics. Such approaches find TSP solutions of good quality but require additional procedures such as beam search and sampling to improve solutions and achieve state-of-the-art performance. However, few studies have focused on improvement heuristics, where a given solution is improved until reaching a near-optimal one. In this work, we propose to learn a local search heuristic based on 2-opt operators via deep reinforcement learning. We propose a policy gradient algorithm to learn a stochastic policy that selects 2-opt operations given a current solution. Moreover, we introduce a policy neural network that leverages a pointing attention mechanism, which unlike previous works, can be easily extended to more general $k$-opt moves. Our results show that the learned policies can improve even over random initial solutions and approach near-optimal solutions at a faster rate than previous state-of-the-art deep learning methods.
APA
Costa, P.R.d.O., Rhuggenaath, J., Zhang, Y. & Akcay, A.. (2020). Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep Reinforcement Learning. Proceedings of The 12th Asian Conference on Machine Learning, in PMLR 129:465-480

Related Material