Speaker: Dhruv Batra (Georgia Tech and Facebook AI Research)

Date and Venue: Friday, 12th Jan 4:00 PM, LH 111.

Title:Visual Dialog: Towards AI agents that can see, talk, and act


Abstract: We are witnessing unprecedented advances in computer vision and artificial intelligence (AI). What lies next for AI? We believe that the next generation of intelligent systems (say the next generation of Google's Assistant, Facebook's M, Apple’s Siri, Amazon’s Alexa) will need to posses the ability to `perceive' their environment (through vision, audition, or other sensors), `communicate’ (i.e., hold a natural language dialog with humans and other agents), and `act’ (e.g., aid humans by executing API calls or commands in a virtual or embodied environment), for tasks such as: — Aiding visually impaired users in understanding their surroundings or social media content (AI: ‘John just uploaded a picture from his vacation in Hawaii’, Human: ‘Great, is he at the beach?’, AI: ‘No, on a mountain’) — Aiding analysts in making decisions based on large quantities of surveillance data (Human: ‘Did anyone enter this room last week?’, AI: ‘Yes, 27 instances logged on camera’, Human: ‘Were any of them carrying a black bag?’), — Interacting with an AI assistant (Human: ‘Alexa – can you see the baby in the baby monitor?’, AI: ‘Yes, I can’, Human: ‘Is he sleeping or playing?’). — Robotics applications (e.g. search and rescue missions) where the operator may be ‘situationally blind’ and operating via language (Human: ‘Is there smoke in any room around you?’, AI: ‘Yes, in one room’, Human: ‘Go there and look for people’). In this talk, I will present a range of projects from my lab (some in collaboration with Prof. Devi Parikh’s lab) towards building such visually grounded conversational agents.

Speaker Bio: https://www.cc.gatech.edu/~dbatra/files/bio.txt