Connor Watts AI Researcher

Scalable Bayesian Reinforcement Learning

The aim of this post is to introduce to you the topic of my PhD research. My research goal is to investigate methods to scale Bayesian reinforcement learning to handle large-scale problems. Before we look at what this means and why I think this is a nobable endeavour, let’s first set the scene.

An Intelligent Agent. An Intelligent Agent

I am interested in the development of intelligent agents. This development has the potential to significantly improve various industries. In many real-world scenarios, where the environment is complex and uncertain, it is often required that an agent learns from experience in order to adapt its behaviour to perform optimally. Reinforcement learning has become a natural choice for this learning process in many sequential decision-making domains. In a reinforcement learning problem, an agent interacts with an environment according to an action-selection strategy, or policy. At each interaction, the agent observes the current state of the environment, as well as receiving a reward depending on that state. A reinforcement learning algorithm seeks to find an optimal policy that maximises a long-term performance measure, typically the expected cumulative rewards. Reinforcement learning has seen recent success in a range of fields including finance, healthcare and robotics. Specifically, it has resulted in the development of agents with superhuman ability in games such as Chess and Go, Dota 2, and StarCraft II.

However despite their recent success, traditional reinforcement learning algorithms are plagued with issues that prevent their widespread adoption in this industry. Primary of these is sample inefficiency: the agents require a large amount of trial-and-error samples to learn a good policy. This amount often grows exponentially with the size of the state space, making them impractical in large complex problems such as high-resolution continuous-space video-games. This issue is linked to the agent balancing exploring the environment to learn about potentially better sources of reward and exploiting the well-known sources of rewards, commonly referred to as the exploration-exploitation trade-off. Finding the right balance is crucial to achieving good performance while minimizing the number of samples and training steps required.

Bayesian reinforcement learning is a reinforcement learning approach that leverages methods from Bayesian inference to incorporate information into the learning process. Here prior information about the problem is represented through the prior distribution and new information is then incorporated using standard rules of Bayesian inference. The main advantage of this approach is that by selecting actions that maximize the expected gain with respect to the information state, it naturally balances the exploration-exploitation trade-off. Additionally, the inclusion of the prior distribution implicitly facilitates regularization, allowing for a more stable training of the agent. In the tabular domain, Bayesian reinforcement learning algorithms have achieved competitive performance against the traditional reinforcement learning algorithms while having higher sample efficiency. However, Bayesian reinforcement learning algorithms are often computationally expensive, especially for high-dimensional spaces.

Leading us nicely to my research topic, for my PhD I will investigate methods to scale Bayesian reinforcement learning to handle large-scale problems, while maintaining computational efficiency and accuracy. With the acute difficulties of traditional Reinforcement Learning algorithms coupled with the recent developments in Bayesian machine learning, model-based reinforcement learning and hardware architecture, I believe this research timely and critical for the field’s development and adoption.