Title: PAC Subset Selection in Stochastic Multi-armed Bandits

Speaker: Shivaram Kalyanakrishnan, Yahoo Labs

Abstract: We consider the problem of selecting, from among n real-valued random variables, a subset of size m of those with the highest means, based on efficiently sampling the random variables. This problem, which we denote Explore-m, finds application in a variety of areas, such as stochastic optimization, simulation and industrial engineering, and on-line advertising. The theoretical basis of our work is an extension of a previous formulation using multi-armed bandits that is devoted to identifying just the single best of n random variables (Explore-1). Under a PAC setting, we provide algorithms for Explore-m and bound their sample complexity.
Our main contribution is the LUCB algorithm, which, interestingly, bears a close resemblance to the well-known UCB algorithm for regret minimization. We derive an expected sample complexity bound for LUCB that is novel even for single-arm selection. We then improve the problem-dependent constant in this bound through a novel algorithmic variant called KL-LUCB. Experiments affirm the relative efficiency of KL-LUCB over other algorithms for Explore-m. Our contributions also include a lower bound on the worst case sample complexity of such algorithms.

Bio: Shivaram Kalyanakrishnan is a scientist at Yahoo Labs Bangalore. His primary research interests lie in the fields of artificial intelligence and machine learning, spanning areas such as reinforcement learning, agents and multiagent systems, humanoid robotics, multi-armed bandits, and on-line advertising. He obtained his Ph.D. in Computer Science from the University of Texas at Austin (2011), and his B.Tech. in Computer Science and Engineering from the Indian Institute of Technology Madras (2004). He has extensively used robot soccer as a test domain for his research, and has actively contributed to initiatives such as RoboCup and the Reinforcement Learning competitions.