Title: Sequential decision making in complex environments

Speaker: Aditya Gopalan, IISc Bangalore

Abstract: Sequential decision making or online learning is concerned with studying how an agent can learn to perform a task with repeated actions and feedback, i.e., trial and error. An increasing number of modern-day automated systems are tasked with learning to make decisions by utilizing available, dynamic data. Take, for instance, learning to personalize content on the Internet - displaying news stories that might interest browsing users, or recommending merchandise that potential shoppers might like - by interacting with users over time, or automated stock trading in which trading decisions are made based on observations of previous trades.
The talk introduces a widely employed model of decision making under uncertainty called the Multi-Armed Bandit, where a decision maker repeatedly faces a choice of playing one of several "arms" or actions, each with an unknown payoff. We will explore variants of the model, study its history in brief, and review well-known approaches to bandit optimization. We will also present recent progress in understanding the behaviour of a natural, Bayesian-inspired algorithm (Thompson sampling or posterior sampling) that enjoys excellent empirical performance, often with complex information structures and complex time dynamics as in reinforcement learning problems.

Bio: Aditya Gopalan is an Assistant Professor at the Indian Institute of Science, Electrical Communication Engineering. He received the Ph.D. degree in electrical engineering from The University of Texas at Austin in 2011, and the B.Tech. and M.Tech. degrees in electrical engineering from the Indian Institute of Technology Madras in 2006. His research interests include communication networks, performance analysis, learning and control.