Title: Flexible Bayesian Models for Complex and Heterogeneous Data

Speaker: Piyush Rai, Duke University

Abstract: Emergence of the Big Data phenomenon has led to a tremendous growth not only in the scale of the data but also in its complexity. Modern data analysis problems now routinely involve data that can be (usually a mix of) high-dimensional, noisy, incomplete, heterogeneous, multi-way (tensor), multi-relational, time-evolving, etc. Moreover, in many settings, learning from data involves not just learning a single task but multiple (possibly related) tasks that may benefit from each-other via a proper "sharing" of information. In this talk, I will first start with a brief overview of my research talking about how probabilistic latent variable models, in particular Bayesian and nonparametric Bayesian approaches, can lead to considerably flexible ways of modeling such complex data. I will then talk specifically about some of my recent work on (1) learning from sparsely observed tensor data which encodes multi-way relations over objects from multiple (typically 3 or more) sets, and (2) learning from heterogeneous multi-modality data, comprising of multiple feature- and/or similarity-based representations, with significant amount of missing data. I will also discuss some specific applications of these frameworks to problems in recommender systems, information retrieval, cognitive neuroscience, and in modeling graph-structured and multi-relational data such as complex networks and knowledge bases. I will conclude with some directions for future work.

Bio: Piyush Rai is a postdoctoral researcher at Duke University and is associated with the interdisciplinary Information Initiative at Duke (iiD). He did his PhD (2007-2012) in Computer Science from the School of Computing, University of Utah. Prior to that, he did his B.Tech. in Computer Science and Engineering from the Indian Institute of Technology, BHU, Varanasi. His research interests are in statistical machine learning, primarily (but not limited to) probabilistic modeling and Bayesian statistics, and also in applying statistical learning methods to solve problems in computer vision, natural language processing, information retrieval, robotics, computational biology, and computer systems.