Abstractions in Machine Learning: We hope to exploit
symmetries and other implicit domain abstractions to scale up a variety
of machine learning and inference algorithms. We have developed
symmetry-aware UCT algorithms in MDPs (Paper
1, Paper 2). We have also devised
novel notions of symmetries such as contextual symmetries, variable-value symmetries, and block-value symmetries in probabilistic
graphical models for downstream inference via Monte-Carlo sampling. We
work on reducing computation in Markov Decision Process (MDP) algorithms
such as UCT by aggregating symmetric states and state-action pairs. We
also work on exploiting similar properties in the context of
probabilistic inference. A recent paper
on this work.
Machine Learning over Crowdsourced Training Data: Does
Machine Learning change when training data is generated over
crowdsourcing? Yes. We first study the quality-size
tradeoff in building training datasets. We extend active learning to
"Re-Active Learning", which allows the same
data point to be relabeled by a different worker over the
crowdsourced platform. Finally, we devise novel algorithms in scenarios
when data has severe class imbalance.
Decision-Theoretic Optimization of Crowdsourcing:
Crowd-sourcing has taken over the business world by storm in the last
few years. Although it is touted as "Artificial Artificial
Intelligence", there are huge opportunities for AI to contribute to its
success. A vision paper describes our
approach to this synergy. We have investigated decision-theoretic
techniques to automatically control workflows on a crowd-sourcing
platform such as Amazon's Mechanical Turk, and have obtained significant
quality improvements for the same price. Recent papers on this work: Paper 1, Paper
2, and Paper 3.
Coherent Large-Scale Multi-Document Summarization:
How to produce coherent, human readable summaries from a set of
10 related documents? How about 100? How about 1000? What is best summary
format when the amount of information that is summarized is huge? We
answer these difficult questions through two systems, GFLOW and
SUMMA. The first is
coherent summarizer for short document collections and the latter produces
hierarchical summaries for large collections. The papers on this work:
Paper 1 and Paper
2. And
SUMMA demo.
Applications of Open IE: Open Information Extraction is a
domain-independent knowledge representation language that is different
from linguistic suggestions such as semantic role labeling or
domain-specific ontologies. We work on exploring the various
applications that Open IE enables. We recently released OREO,
a rapidly retargetable software to map open extractions to a domain
ontology. In this recent
paper we show that Open IE representation beats dependency parsing,
and semantic role labelers in learning useful word vector
representations via deep learning. Earlier, in this paper
we used Open IE to automatically induce domain-independent event
schemas.
Commonsense Knowledge Extraction: Automatically creating
corpora of commonsense knowledge based on reasoning over extracted
information from the Web. We automatically learned selectional preferences
and meta-properties of relations present in natural language text. We also
built a large repository of relational n-grams -- a semantic
analog to the n-grams corpus, which were used to induce event schemas
completely automatically. All results from this project are publically
available: set of
functional relations, selectional
preference demo, and relational n-grams corpus.
NLP over Microblogs: Micro-blogging sites such as Twitter have
exploded in popularity in the recent times. Tweets often represent the
most up-to-date information and "buzz" on a vast spectrum of topics,
however, their sheer number adds to huge information overload. We recently
released a suite of NLP
tools for tweets. We are currently designing automated information
extraction systems over Twitter. A recent
paper and a demo of
automatically generated calendar of events.
Large-scale Probabilistic Planning:
Solving large Markov Decision Processes by combining several optimal
as well as approximate techniques. We hope to alleviate the memory
bottleneck in solving the large MDPs and scale to large, industry
sized probabilistic planning problems. Some significant papers on this
work: Paper 1 and
Paper 2.
Our planner, Glutton, was runners up in 2011
International Probabilistic Planning
competition.
Half-Open Information Extraction:
Open Information Extraction, while a scalable paradigm, suffers from the drawback
that it does not normalize its extractions with a domain schema. Our recent work
explores middle grounds between completely open and completely closed variants of IE
to leverage benefits of both. An article on
this work.
Formal
Inference in Translation Graph: Developing probabilistic
inference techniques to formalize inference in translation graphs, a graph
that is formed by combining all available dictionaries between all possible
languages in the world. An efficient and high quality inference procedure
will enable the system to produce good translations from a sense in one
language to several languages, even when there is no available dictionary
between the exact pair of languages.
A journal paper on this work
and the AAAI Nectar version.
Open Information Extraction over News:
A relation-independent question-answering system over thousands of current
news articles. We apply
Textrunner
information extraction technology as well as news-specific heuristics to
construct a massive knowledge base of current events. This information can
be queried by asking specific questions or by keyword search.
Hybridizing
Planners: A fast but
suboptimal planner may be
hybridized with a slow but optimal one to yield a high-quality, anytime
planner that solves the problems in intermediate times. We developed
HybPlan, a planner that hybridized GPT and MBP for probabilistic planning.
Concurrent
Probabilistic Temporal Planning: Developing
high-quality and efficient techniques to solve MDPs that formulate probabilistic planning
problems involving durative and concurrent actions.
Publications
A complete list of publications can be found
here.
Software, Demos and Data
A complete list of released softwares, demos and data can be found
here.