Comparison of Machine Learners on an ABA Experiment Format of the Cart-Pole Task

Leonard M. Eberding

Comparison of Machine Learners on an ABA Experiment Format of the Cart-Pole Task

Leonard M. Eberding

Proceedings of the Second International Workshop on Self-Supervised Learning, PMLR 159:49-63, 2022.

Abstract

Current approaches to online learning focus primarily on reinforcement learning (RL) - algorithms that learn through feedback from experience. While most current RL algorithms have shown good results in learning to perform tasks for which they were specifically designed, most of them lack a level of generalization needed to use existing knowledge to handle novel situations - a property referred to as autonomous transfer learning. Situations encountered by such systems which were not present during the training phase can lead to critical failure. In the present research we analyzed the autonomous transfer learning capabilities of five different machine learning approaches - i.e. an Actor-Critic, a Q-Learner, a Policy Gradient Learner, a Double-Deep Q-Learner, and OpenNARS for Applications. Following a classic ABA experimental format, the learners were all trained on the well-known cart-pole task in phase A-1, before strategic changes to the task were introduced in phase B, consisting of inverting the direction of control of the cart (move-left command moved the cart to the right and vice versa), as well as the introduction of noise. All analyzed learners show an extreme performance drop when the action command is inverted in phase B, resulting in long (re-)training periods trying to reach A1 performance. Most learners do not reach initial A1 performance levels in phase B, some falling very far from them. Furthermore, previously learned knowledge is not retained during the re-training, resulting in an even larger performance drop when the task is changed back to the original settings in phase A2. Only one learner (NARS) reached comparable performance in A1 and A2, demonstrating retention of, and return to, priorly-acquired knowledge.

Cite this Paper

BibTeX


@InProceedings{pmlr-v159-eberding22a,
  title = 	 {Comparison of Machine Learners on an ABA Experiment Format of the Cart-Pole Task},
  author =       {Eberding, Leonard M.},
  booktitle = 	 {Proceedings of the Second International Workshop on Self-Supervised Learning},
  pages = 	 {49--63},
  year = 	 {2022},
  editor = 	 {Thórisson, Kristinn R. and Robertson, Paul},
  volume = 	 {159},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--14 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v159/eberding22a/eberding22a.pdf},
  url = 	 {https://proceedings.mlr.press/v159/eberding22a.html},
  abstract = 	 {Current approaches to online learning focus primarily on reinforcement learning (RL) - algorithms that learn through feedback from experience. While most current RL algorithms have shown good results in learning to perform tasks for which they were specifically designed, most of them lack a level of generalization needed to use existing knowledge to handle novel situations - a property referred to as autonomous transfer learning. Situations encountered by such systems which were not present during the training phase can lead to critical failure. In the present research we analyzed the autonomous transfer learning capabilities of five different machine learning approaches - i.e. an Actor-Critic, a Q-Learner, a Policy Gradient Learner, a Double-Deep Q-Learner, and OpenNARS for Applications. Following a classic ABA experimental format, the learners were all trained on the well-known cart-pole task in phase A-1, before strategic changes to the task were introduced in phase B, consisting of inverting the direction of control of the cart (move-left command moved the cart to the right and vice versa), as well as the introduction of noise. All analyzed learners show an extreme performance drop when the action command is inverted in phase B, resulting in long (re-)training periods trying to reach A1 performance. Most learners do not reach initial A1 performance levels in phase B, some falling very far from them. Furthermore, previously learned knowledge is not retained during the re-training, resulting in an even larger performance drop when the task is changed back to the original settings in phase A2. Only one learner (NARS) reached comparable performance in A1 and A2, demonstrating retention of, and return to, priorly-acquired knowledge.}
}

Endnote

%0 Conference Paper
%T Comparison of Machine Learners on an ABA Experiment Format of the Cart-Pole Task
%A Leonard M. Eberding
%B Proceedings of the Second International Workshop on Self-Supervised Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kristinn R. Thórisson
%E Paul Robertson	
%F pmlr-v159-eberding22a
%I PMLR
%P 49--63
%U https://proceedings.mlr.press/v159/eberding22a.html
%V 159
%X Current approaches to online learning focus primarily on reinforcement learning (RL) - algorithms that learn through feedback from experience. While most current RL algorithms have shown good results in learning to perform tasks for which they were specifically designed, most of them lack a level of generalization needed to use existing knowledge to handle novel situations - a property referred to as autonomous transfer learning. Situations encountered by such systems which were not present during the training phase can lead to critical failure. In the present research we analyzed the autonomous transfer learning capabilities of five different machine learning approaches - i.e. an Actor-Critic, a Q-Learner, a Policy Gradient Learner, a Double-Deep Q-Learner, and OpenNARS for Applications. Following a classic ABA experimental format, the learners were all trained on the well-known cart-pole task in phase A-1, before strategic changes to the task were introduced in phase B, consisting of inverting the direction of control of the cart (move-left command moved the cart to the right and vice versa), as well as the introduction of noise. All analyzed learners show an extreme performance drop when the action command is inverted in phase B, resulting in long (re-)training periods trying to reach A1 performance. Most learners do not reach initial A1 performance levels in phase B, some falling very far from them. Furthermore, previously learned knowledge is not retained during the re-training, resulting in an even larger performance drop when the task is changed back to the original settings in phase A2. Only one learner (NARS) reached comparable performance in A1 and A2, demonstrating retention of, and return to, priorly-acquired knowledge.

APA


Eberding, L.M.. (2022). Comparison of Machine Learners on an ABA Experiment Format of the Cart-Pole Task. Proceedings of the Second International Workshop on Self-Supervised Learning, in Proceedings of Machine Learning Research 159:49-63 Available from https://proceedings.mlr.press/v159/eberding22a.html.

Related Material

Download PDF