Title: Social Network Extraction from Text

Speaker: Apoorv Agarwal, Columbia University

Abstract: Social networks are usually built using meta-data such as sender-recipient email links or self declared friendships. However, a large part of a social network is expressed and maintained through the use of language. For example, in an email people might share who they "talk to" or have "dinner with." Extracting such social interactions that are expressed using language through a wide variety of linguistic constructs requires natural language processing (NLP) and machine learning (ML) techniques.
In this talk, I will present a formalization of networks based on interactions. We have, thus far, extracted networks from three broad genres of text: 19th century British literature, movie screenplays, and the Enron email corpus. I will present technical challenges in dealing with each genre followed by NLP and ML solutions and applications.

Bio: Apoorv Agarwal is a sixth year Ph.D. candidate in the Computer Science department at Columbia University, NY. His areas of interest and specialization are Natural Language Processing and Machine Learning. He is one of the recipients of the 2013-14 IBM PhD fellowship for his work with the DeepQA team that built Watson (a machine capable of answering Jeopardy! questions). His work on social network extraction from text has been demonstrated at the DARPA demo day held in May 2014 at the Pentagon and at the NYC Media Lab Annual Summit 2014. He is recipient of the NSF Innovation Corps (I-corps) award for Aug-Dec 2014. The award helped him identify commercial applications for his research. He is now one of the two founders of Text IQ, a start-up that aims to make the document review process faster for attorneys.
Relevant publications can be found at my Google scholar page. Homepage (last updated in 2012!)