Title: PAC Subset Selection in Stochastic Multi-armed Bandits
Speaker: Shivaram Kalyanakrishnan, Yahoo Labs
Abstract:
We consider the problem of selecting, from among n real-valued random
variables, a subset of size m of those with the highest means, based on
efficiently sampling the random variables. This problem, which we denote
Explore-m, finds application in a variety of areas, such as stochastic
optimization, simulation and industrial engineering, and on-line
advertising. The theoretical basis of our work is an extension of a
previous formulation using multi-armed bandits that is devoted to
identifying just the single best of n random variables (Explore-1).
Under a PAC setting, we provide algorithms for Explore-m and bound their
sample complexity.
Our main contribution is the LUCB algorithm, which, interestingly, bears
a close resemblance to the well-known UCB algorithm for regret
minimization. We derive an expected sample complexity bound for LUCB
that is novel even for single-arm selection. We then improve the
problem-dependent constant in this bound through a novel algorithmic
variant called KL-LUCB. Experiments affirm the relative efficiency of
KL-LUCB over other algorithms for Explore-m. Our contributions also
include a lower bound on the worst case sample complexity of such
algorithms.
Bio:
Shivaram Kalyanakrishnan is a scientist at Yahoo Labs Bangalore. His
primary research interests lie in the fields of artificial intelligence
and machine learning, spanning areas such as reinforcement learning,
agents and multiagent systems, humanoid robotics, multi-armed bandits,
and on-line advertising. He obtained his Ph.D. in Computer Science from
the University of Texas at Austin (2011), and his B.Tech. in Computer
Science and Engineering from the Indian Institute of Technology Madras
(2004). He has extensively used robot soccer as a test domain for his
research, and has actively contributed to initiatives such as RoboCup
and the Reinforcement Learning competitions.