Learning with contextual information in non-stationary environments

Sean Anderson; Joao P. Hespanha

Learning with contextual information in non-stationary environments

Sean Anderson, Joao P. Hespanha

Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, PMLR 283:856-868, 2025.

Abstract

We consider a repeated decision-making setting in which the decision maker has access to contextual information and lacks a model or a priori knowledge of the relationship between the actions, context, and costs that they aim to minimize. Moreover, we assume that the environment may be non-stationary due to the presence of other agents that may be reacting to our decisions. We propose an algorithm inspired by log-linear learning that uses Boltzmann distributions to generate stochastic policies. We consider two general notions of context and provide regret bounds for each: 1) a finite number of possible measurements and 2) a continuum of measurements that weight a set of finite classes. In the non-stationary setting, we incur some regret but can make it arbitrarily small. We illustrate the operation of the algorithm through two examples: one that uses synthetic data (based on the rock-paper-scissors game) and another that uses real data for malware classification. Both examples exhibit (by construction or naturally) significant lack of stationarity.

Cite this Paper

BibTeX

@InProceedings{pmlr-v283-anderson25a,
  title = 	 {Learning with contextual information in non-stationary environments},
  author =       {Anderson, Sean and Hespanha, Joao P.},
  booktitle = 	 {Proceedings of the 7th Annual Learning for Dynamics \& Control Conference},
  pages = 	 {856--868},
  year = 	 {2025},
  editor = 	 {Ozay, Necmiye and Balzano, Laura and Panagou, Dimitra and Abate, Alessandro},
  volume = 	 {283},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {04--06 Jun},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v283/main/assets/anderson25a/anderson25a.pdf},
  url = 	 {https://proceedings.mlr.press/v283/anderson25a.html},
  abstract = 	 {We consider a repeated decision-making setting in which the decision maker has access to contextual information and lacks a model or a priori knowledge of the relationship between the actions, context, and costs that they aim to minimize. Moreover, we assume that the environment may be non-stationary due to the presence of other agents that may be reacting to our decisions. We propose an algorithm inspired by log-linear learning that uses Boltzmann distributions to generate stochastic policies. We consider two general notions of context and provide regret bounds for each: 1) a finite number of possible measurements and 2) a continuum of measurements that weight a set of finite classes. In the non-stationary setting, we incur some regret but can make it arbitrarily small. We illustrate the operation of the algorithm through two examples: one that uses synthetic data (based on the rock-paper-scissors game) and another that uses real data for malware classification. Both examples exhibit (by construction or naturally) significant lack of stationarity.}
}

Endnote

%0 Conference Paper
%T Learning with contextual information in non-stationary environments
%A Sean Anderson
%A Joao P. Hespanha
%B Proceedings of the 7th Annual Learning for Dynamics \& Control Conference
%C Proceedings of Machine Learning Research
%D 2025
%E Necmiye Ozay
%E Laura Balzano
%E Dimitra Panagou
%E Alessandro Abate	
%F pmlr-v283-anderson25a
%I PMLR
%P 856--868
%U https://proceedings.mlr.press/v283/anderson25a.html
%V 283
%X We consider a repeated decision-making setting in which the decision maker has access to contextual information and lacks a model or a priori knowledge of the relationship between the actions, context, and costs that they aim to minimize. Moreover, we assume that the environment may be non-stationary due to the presence of other agents that may be reacting to our decisions. We propose an algorithm inspired by log-linear learning that uses Boltzmann distributions to generate stochastic policies. We consider two general notions of context and provide regret bounds for each: 1) a finite number of possible measurements and 2) a continuum of measurements that weight a set of finite classes. In the non-stationary setting, we incur some regret but can make it arbitrarily small. We illustrate the operation of the algorithm through two examples: one that uses synthetic data (based on the rock-paper-scissors game) and another that uses real data for malware classification. Both examples exhibit (by construction or naturally) significant lack of stationarity.

APA

Anderson, S. & Hespanha, J.P.. (2025). Learning with contextual information in non-stationary environments. Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, in Proceedings of Machine Learning Research 283:856-868 Available from https://proceedings.mlr.press/v283/anderson25a.html.

Related Material

Download PDF