Reward functions and the nature of explanation for intelligent neural systems
The field of reinforcement learning has made astonishing progress by incorporating deep function approximators and adapting existing algorithms to work with them. Human-level performance has been obtained in a wide range of domains such as Atari, Go, Chess, and Starcraft. In each of these cases, we have access to a programmatic reward function — a formal way to determine who won or lost, or how many points have been scored. These reward functions are central to our understanding of the systems that solve these problems. However, for many of the problems that humans and animals solve, we can’t formally write down a reward function. To understand a real neural system we should similarly try to understand what it is trying to optimize. But can we understand our in-silico systems and how they solve problems when we don’t have direct knowledge of the reward functions they optimize? What might this look like in practice? We’ll argue that to understand neural systems we need to develop new methods for inferring underlying reward functions.
A pizza lunch will be served at 11:45am. The seminar will run from 12:00pm – 1:30pm.