Breadcrumb

Multi-agent reinforcement learning

Supervisor: David Leslie

Theme: Optimisation under Uncertainty

Reinforcement learning is a process where individuals involved in a task select actions, receive rewards, and attempt to improve their strategy based only on this information. In a multi-agent setting the rewards available to an individual from selecting an action depend on the strategies employed by the other players, even though the actions of the other players might never be directly observed. Furthermore, all the agents adapt their strategies simultaneously, and agents may be required to learn to play non-deterministic strategies, so this is a significantly more difficult problem than either single-agent learning or traditional learning in games. David Leslie and Sean Collins have developed techniques for analysing the asymptotic properties of simple reinforcement learning algorithms in normal form games. This has enabled the discovery of various techniques that can be applied by the learners to improve the chance of converging to Nash equilibrium. One particularly interesting discovery was that when different players learn at different rates the chance of convergence to equilibrium is better than if all players are homogeneous. Potential areas of study include rate of convergence of multi-agent algorithms and developing generalisations of reinforcement learning that can make use of additional information when it becomes available.