Title: An Excursion in Probabilistic Hashing Techniques for Big Data
Speaker: Anshumali Shrivastava, Rice University
Large scale machine learning and data mining applications are constantly
dealing with datasets at TB scale and the anticipation is that soon it
will reach PB levels. At this scale conventional algorithms fail and
simple data mining operations such as search, learning, clustering, etc.
become challenging In this talk, I will introduce probabilistic hashing
techniques for large scale search and learning. I will show how the old
hashing framework, originally meant for sub-linear search, can be
converted into fast learning algorithms. I will talk about our recent
success in constructing hash functions for dot product by making use of
asymmetry. Such a construction is not possible in the conventional
setting and was a known hard problem. I will further show the direct
consequence of hashing inner products in speeding up popular learning
algorithms. Later, I will discuss the recent improvements that I found
in some decade old textbook hashing algorithms, which will include the
fastest way of performing minwise hashing in practice. I will
demonstrate the utility of the above techniques on various real
applications including search, learning, collaborative filtering, record
Anshumali Shrivastava is an Assistant Professor in the Department of
Computer Science at Rice University with joint appointments in
Statistics and ECE department. His broad research interests include
large scale machine learning, randomized algorithms for big data systems
and graph mining. His research on hashing inner products won Best Paper
Award at NIPS 2014 while his work on representing graphs got the Best
Paper Award at IEEE/ACM ASONAM 2014.
He obtained his Ph. D. in computer science from Cornell University.
Before joining Cornell, he worked as a scientist at FICO (Fair Isaac
Corp.) research Bangalore, India. Anshumali did his bachelors and masters in mathematics and computing from Indian Institute of Technology (IIT) Kharagpur India.