Policy gradient (PG) methods have demonstrated remarkable success in a wide range of sequential decision-making tasks. However, the majority of research efforts have focused on discrete pro- blems, leaving the convergence analysis of PG methods for controlled diffusions as an unresolved issue. This work proves the convergence of PG methods for finite-horizon linear-quadratic control problems. We consider a continuous-time Gaussian policy whose mean is linear in the state variable and whose covariance is state-independent. We propose geometry-aware gradient descents for the mean and covariance of the policy using the Fisher geometry and the Bures-Wasserstein geometry, respectively. The policy iterates are shown to converge globally to the optimal policy with a linear rate. We further propose a novel PG method with discrete-time policies. The algorithm leverages the continuous-time analysis, and achieves a robust linear convergence across different action frequencies. A numerical experiment confirms the convergence and robustness of the proposed algorithm. If time allows, extensions of the algorithm to nonlinear control problems will be discussed.
Convergence of policy gradient methods for stochastic control problems
Mean field games of optimal stopping
We are interested in the study of stochastic games for which each player faces an optimal stopping problem. In our setting, the players may interact through the criterion to optimize as well as through their dynamics. After briefly discussing the N-players game, we formulate the corresponding mean field problem. In particular, we introduce a weak formulation of the game for which we are able to prove existence of Nash equilibria for a large class of criteria. We also prove that equilibria for the mean field problem provide approximated Nash equilibria for the N-players game, and we formally derive the master equation associated with our mean field game.
This talk is based on joint work with D. Possamai.
Exploration-exploitation trade-off for continuous-time reinforcement learning
Recently, reinforcement learning (RL) has attracted substantial research interests. Much of the attention and success, however, has been for the discrete-time setting. Continuous-time RL, despite its natural analytical connection to stochastic controls, has been largely unexplored and with limited progress. In particular, characterising sample efficiency for continuous-time RL algorithms remains a challenging and open problem.
In this talk, we develop a framework to analyse model-based reinforcement learning in the episodic setting. We then apply it to optimise exploration-exploitation trade-off for linear-convex RL problems, and report sublinear (or even logarithmic) regret bounds for a class of learning algorithms inspired by filtering theory. The approach is probabilistic, involving analysing learning efficiency using concentration inequalities for correlated continuous-time observations, and applying stochastic control theory to quantify the performance gap between applying greedy policies derived from estimated and true models.
Mean field portfolio games
First, I will discuss a mean field portfolio game in a general framework. Using a dynamic programming principle and a martingale optimality principle, I establish a one-to-one correspondence between the Nash equilibrium and some BSDE. Such a correspondence is key to the uniqueness result of Nash equilibria. Generally, this BSDE can be solved under a weak interaction assumption. Motivated by this assumption, I will introduce an asymptotic expansion result of the game value in terms of the interaction parameter. Second, I will incorporate consumption into the portfolio game and show that the equilibrium investment and consumption can be fully characterized by one BSDE.
Adapted Wasserstein distance between the laws of SDEs
We consider an adapted optimal transport problem between the laws of Markovian stochastic differential equations (SDEs) and establish optimality of the so-called synchronous coupling between the given laws. The proof of this result is based on time-discretisation methods and reveals an interesting connection between the synchronous coupling and the celebrated discrete-time Knothe–Rosenblatt rear- rangement. We also provide a related result on equality of various topologies when restricted to certain laws of continuous-time stochastic processes. The result is of relevance for the study of stability with respect to model specification in mathematical finance.
The talk is based on joint work with Julio Backhoff-Veraguas and Ben Robinson.
A McKean-Vlasov game of commodity production, consumption and trading
We propose a model where a producer and a consumer can affect the price dynamics of some commodity controlling drift and volatility of, respectively, the production rate and the consumption rate. We assume that the producer has a short position in a forward contract on λ units of the underlying at a fixed price F, while the consumer has the corresponding long position. Moreover, both players are risk-averse with respect to their financial position and their risk aversions are modelled through an integrated-variance penalization. We study the impact of risk aversion on the interaction between the producer and the consumer as well as on the derivative price. In mathematical terms, we are dealing with a two-player linear-quadratic McKean–Vlasov stochastic differential game. Using methods based on the martingale optimality principle and BSDEs, we find a Nash equilibrium and characterize the corresponding strategies and payoffs in semi-explicit form. Furthermore, we compute the two indifference prices (one for the producer and one for the consumer) induced by that equilibrium and we determine the quantity λ such that the players agree on the price. Finally, we illustrate our results with some numerics. In particular, we focus on how the risk aversions and the volatility control costs of the players affect the derivative price.
This is a joint paper with R. Aid, O. Bonesini and L. Campi.
Poisson hulls and nonparametric boundary models
We consider a Poisson point process on a general state space. Using an axiomatic approach, we introduce a hull as a random subset of the state space determined by this process. A key example is the convex hull of a finite Poisson process in Euclidean space. In the first part of the talk we shall provide some first properties along with other examples. Forming conditional expectations, Poisson hulls can be used as natural estimators of linear functions of the underlying intensity measure. Using a spatial Markov property, we will derive some fundamental properties of these estimators. In particular we shall discuss moment formulas and the connection to the (anticipating) stochastic Kabanov–Skorohod integral. In the second part of the talk we shall discuss central limit theorems for growing intensities. Our method is based on the Stein-Malliavin approach and yields presumably optimal rates of convergence. Finally we present an application to nonparametric boundary models.
The talk is based on joint work with Ilya Molchanov (Bern).
Risk Quantification, Optimal Stopping and Reflected BSDEs for a Class of Informational Markets
Mean-field liquidation games with market drop-out
Equilibrium in Infinite-Dimensional Stochastic Games with Mean-Field Interaction
We consider a general class of finite-player stochastic games with mean-field interaction, in which the linear-quadratic objective functional includes linear operators acting on square-integrable controls. We propose a novel approach for deriving explicitly the Nash equilibrium of the game by reducing the associated first order conditions to a system of stochastic Fredholm equations of the second kind and deriving their closed-form solution. Furthermore, by proving stability results for the system of Fredholm equations, we derive the convergence of the equilibrium of the N-player game to the corresponding mean- field equilibrium. As a by-product of our results we also derive epsilon-Nash equilibrium for the mean- field game and we show that the conditions for existence of an equilibrium in the mean-field limit are significantly less restrictive than in the finite-player game. Finally we apply our general framework to solve various examples, such as stochastic Volterra linear-quadratic games, models of systemic risk and advertising with delay and optimal liquidation games with transient price impact.
The talk is based on a joint work with Eduardo Abi-Jaber and Moritz Voss.