Title: Sequential decision making in complex environments
Speaker: Aditya Gopalan, IISc Bangalore
Abstract:
Sequential decision making or online learning is concerned with studying
how an agent can learn to perform a task with repeated actions and
feedback, i.e., trial and error. An increasing number of modern-day
automated systems are tasked with learning to make decisions by
utilizing available, dynamic data. Take, for instance, learning to
personalize content on the Internet - displaying news stories that might
interest browsing users, or recommending merchandise that potential
shoppers might like - by interacting with users over time, or automated
stock trading in which trading decisions are made based on observations
of previous trades.
The talk introduces a widely employed model of decision making under
uncertainty called the Multi-Armed Bandit, where a decision maker
repeatedly faces a choice of playing one of several "arms" or actions,
each with an unknown payoff. We will explore variants of the model,
study its history in brief, and review well-known approaches to bandit
optimization. We will also present recent progress in understanding the
behaviour of a natural, Bayesian-inspired algorithm (Thompson sampling
or posterior sampling) that enjoys excellent empirical performance,
often with complex information structures and complex time dynamics as
in reinforcement learning problems.
Bio:
Aditya Gopalan is an Assistant Professor at the Indian Institute of
Science, Electrical Communication Engineering. He received the Ph.D.
degree in electrical engineering from The University of Texas at Austin
in 2011, and the B.Tech. and M.Tech. degrees in electrical engineering
from the Indian Institute of Technology Madras in 2006. His research
interests include communication networks, performance analysis, learning
and control.