[edit]
Learning with contextual information in non-stationary environments
Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, PMLR 283:856-868, 2025.
Abstract
We consider a repeated decision-making setting in which the decision maker has access to contextual information and lacks a model or a priori knowledge of the relationship between the actions, context, and costs that they aim to minimize. Moreover, we assume that the environment may be non-stationary due to the presence of other agents that may be reacting to our decisions. We propose an algorithm inspired by log-linear learning that uses Boltzmann distributions to generate stochastic policies. We consider two general notions of context and provide regret bounds for each: 1) a finite number of possible measurements and 2) a continuum of measurements that weight a set of finite classes. In the non-stationary setting, we incur some regret but can make it arbitrarily small. We illustrate the operation of the algorithm through two examples: one that uses synthetic data (based on the rock-paper-scissors game) and another that uses real data for malware classification. Both examples exhibit (by construction or naturally) significant lack of stationarity.