Learning with contextual information in non-stationary environments

Sean Anderson, Joao P. Hespanha
Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, PMLR 283:856-868, 2025.

Abstract

We consider a repeated decision-making setting in which the decision maker has access to contextual information and lacks a model or a priori knowledge of the relationship between the actions, context, and costs that they aim to minimize. Moreover, we assume that the environment may be non-stationary due to the presence of other agents that may be reacting to our decisions. We propose an algorithm inspired by log-linear learning that uses Boltzmann distributions to generate stochastic policies. We consider two general notions of context and provide regret bounds for each: 1) a finite number of possible measurements and 2) a continuum of measurements that weight a set of finite classes. In the non-stationary setting, we incur some regret but can make it arbitrarily small. We illustrate the operation of the algorithm through two examples: one that uses synthetic data (based on the rock-paper-scissors game) and another that uses real data for malware classification. Both examples exhibit (by construction or naturally) significant lack of stationarity.

Cite this Paper


BibTeX
@InProceedings{pmlr-v283-anderson25a, title = {Learning with contextual information in non-stationary environments}, author = {Anderson, Sean and Hespanha, Joao P.}, booktitle = {Proceedings of the 7th Annual Learning for Dynamics \& Control Conference}, pages = {856--868}, year = {2025}, editor = {Ozay, Necmiye and Balzano, Laura and Panagou, Dimitra and Abate, Alessandro}, volume = {283}, series = {Proceedings of Machine Learning Research}, month = {04--06 Jun}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v283/main/assets/anderson25a/anderson25a.pdf}, url = {https://proceedings.mlr.press/v283/anderson25a.html}, abstract = {We consider a repeated decision-making setting in which the decision maker has access to contextual information and lacks a model or a priori knowledge of the relationship between the actions, context, and costs that they aim to minimize. Moreover, we assume that the environment may be non-stationary due to the presence of other agents that may be reacting to our decisions. We propose an algorithm inspired by log-linear learning that uses Boltzmann distributions to generate stochastic policies. We consider two general notions of context and provide regret bounds for each: 1) a finite number of possible measurements and 2) a continuum of measurements that weight a set of finite classes. In the non-stationary setting, we incur some regret but can make it arbitrarily small. We illustrate the operation of the algorithm through two examples: one that uses synthetic data (based on the rock-paper-scissors game) and another that uses real data for malware classification. Both examples exhibit (by construction or naturally) significant lack of stationarity.} }
Endnote
%0 Conference Paper %T Learning with contextual information in non-stationary environments %A Sean Anderson %A Joao P. Hespanha %B Proceedings of the 7th Annual Learning for Dynamics \& Control Conference %C Proceedings of Machine Learning Research %D 2025 %E Necmiye Ozay %E Laura Balzano %E Dimitra Panagou %E Alessandro Abate %F pmlr-v283-anderson25a %I PMLR %P 856--868 %U https://proceedings.mlr.press/v283/anderson25a.html %V 283 %X We consider a repeated decision-making setting in which the decision maker has access to contextual information and lacks a model or a priori knowledge of the relationship between the actions, context, and costs that they aim to minimize. Moreover, we assume that the environment may be non-stationary due to the presence of other agents that may be reacting to our decisions. We propose an algorithm inspired by log-linear learning that uses Boltzmann distributions to generate stochastic policies. We consider two general notions of context and provide regret bounds for each: 1) a finite number of possible measurements and 2) a continuum of measurements that weight a set of finite classes. In the non-stationary setting, we incur some regret but can make it arbitrarily small. We illustrate the operation of the algorithm through two examples: one that uses synthetic data (based on the rock-paper-scissors game) and another that uses real data for malware classification. Both examples exhibit (by construction or naturally) significant lack of stationarity.
APA
Anderson, S. & Hespanha, J.P.. (2025). Learning with contextual information in non-stationary environments. Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, in Proceedings of Machine Learning Research 283:856-868 Available from https://proceedings.mlr.press/v283/anderson25a.html.

Related Material