Title: Ensemble Learning in the Crowd-sourcing Era

Speaker: Gaurav Pandey, Mt Sinai School of Medicine

Abstract: Crowd-sourcing-based platforms, such as Kaggle, InnoCentive and DREAM challenges, are transforming our ability to address hard predictive modeling tasks by leveraging "wisdom of the crowds". In this talk, I will present our work on enhancing the predictive power of these platforms, especially DREAM challenges, through two avenues: (1) a collaborative-competitive setup of prediction challenges/competitions and (2) learning heterogeneous ensemble predictors. In the collaborative-competitive setup, challenge participants are encouraged to both compete and collaborate (share ideas), both mechanisms leading to improvement in predictive power. Heterogeneous ensembles are a data-driven method to achieve this improvement by "smartly" assimilating the knowledge embedded in predictions submitted by individual participants. I will also share results demonstrating the potential of these approaches for difficult biomedical problems, such as the prediction of protein function and cancer phenotypes.

Bio: Gaurav Pandey is an Assistant Professor in the Department of Genetics and Genomic Sciences at the Mount Sinai School of Medicine (New York) and is part of the newly formed Institute for Genomics and Multiscale Biology. He completed his Ph.D. in computer science and engineering from the University of Minnesota, Twin Cities in 2010, and subsequently completed a post-doctoral fellowship at the University of California, Berkeley. His primary fields of interest are computational biology, genomics and large-scale data analysis and mining, and he has published extensively in these areas.