Reinforcement learning for linear-convex models with jumps
We study finite-time horizon continuous-time linear-convex reinforcement learning problems in an episodic setting. In these problems, an unknown linear jump-diffusion process is controlled subject to nonsmooth convex costs. We start with the pure diffusion case with quadratic costs, and propose a least-squares algorithm which achieves a logarithmic regret bound of order O((lnM)(lnlnM)), with M being the number of learning episodes; the proof relies on the robustness of the associated Riccati differential equation and sub-exponential properties of the least-squares estimators. We then extend the least-squares algorithm to linear-convex learning problems with jumps, and establish a regret of the order O((MlnM)1/2); the analysis leverages the Lipschitz stability of the associated forward-backward stochastic differential equation and concentration properties of sub-Weibull random variables.
This is joint work with Matteo Basei, Xin Guo and Anran Hu.