Breadcrumb

(POSTPONED) Posterior weighted reinforcement learning with state uncertainty

Fri 18 January 2013, 14:15

David Leslie
Bristol

Probability and Statistics

Organisers: Nick Whiteley, Feng Yu

ABSTRACT
This seminar has been postponed to 8 Feb.

Reinforcement learning models are, in essence, online algorithms to estimate the expected reward in each of a set of states by allocating observed rewards to states and calculating averages. Generally it is assumed that a learner can unambiguously identify the state of nature. However in any natural environment the state information is noisy, so that the learner cannot be certain about the current state of nature. Under state uncertainty it is no longer immediately obvious how to perform reinforcement learning, since the observed reward cannot be unambiguously allocated to a particular state of the environment. A new technique, posterior weighted reinforcement learning, is introduced. In this process the reinforcement learning updates are weighted according to the posterior state probabilities, calculated after observation of the reward. We show that this modified algorithm can converge to correct reward estimates, and show the procedure to be a variant of an online expectation-maximisation algorithm, allowing further analysis to be carried out.