[edit]
Decision making with limited feedback
Proceedings of Algorithmic Learning Theory, PMLR 83:359-367, 2018.
Abstract
When models are trained for deployment in decision-making in various real-world
settings, they are typically trained in batch mode. Historical data is used to
train and validate the models prior to deployment. However, in many settings,
\emph{feedback} changes the nature of the training process. Either the learner
does not get full feedback on its actions, or the decisions
made by the trained model influence what future training data it will see.
In this paper, we
focus on the problems of recidivism prediction and predictive policing. We
present the first algorithms with provable regret for these problems, by
showing that both problems (and others like these) can be abstracted into a general
reinforcement learning framework called partial monitoring. We also
discuss the policy implications of these solutions.