Deterministic Execution in Multithreaded Applications
I am working with Prof. Sorav Bansal on Dthreads. Multithreaded programming is very difficult to get right. A key problem is non-determinism, which complicates debugging, testing and reproducing errors in multithreaded applications. One way to simplify multithreaded programming is to enforce deterministic execution.
Dthreads (a re-implementation of pthreads library) is a paper that presents an efficient deterministic multithreaded system for unmodified C/C++ applications. It not only prevents semantic errors like race conditions and deadlocks, but also enhances performance by eliminating false sharing of cache lines.
The aim of our project is to improve efficiency of dthreads and to analyze performance overhead as compared to pthreads library. One way to reduce overhead is to have a per-lock token instead of having a global token. Further, load balancing (by inserting manual synchronisation points) can lead to drastic improvement in performance.
Check out our mid-term presentation here.
Internship at Directi, Mumbai: Knowledge Graph based Keyword Update
I worked in a subdivision of Directi, Media.net that primarily deals with displaying ad keywords on a webpage. My role was to build an automated system that given a keyword, identifies a product about which the keyword is taking about and replaces that product with the lastest version of that product in the market keeping the structure of the keyword intact (eg. buy new iphone 3G changes to buy new iphone 5).
I created an entity knowledge graph of electronic gadgets and automobiles based on data available on wikipedia. Using this knowledge graph, I built a system that automatically updates a given keyword. The presentation describes the project in more detail. In the end, I managed to achieve a good keyword replacement efficiency.
I was offered a full time position as a Software Developer at Directi, Mumbai based on my internship performance.
IRD Unit, IIT Delhi: A Generic System for Automated Detection of Activities in Surveillance Videos
Motivation: In recent years, there has been a growing need for safety and security. This has led to rapid investment in building and deploying surveillance systems which in turn has resulted in a huge increase in the volume of surveillance data that has to be analyzed. In such a scenario, manual surveillance is very time consuming as well as error prone. All this has led to the development of automatic video surveillance systems which can analyze the surveillance data and tag unusual activites in it. A subtask of being able to tag unusual activities is the ability to detect meaningful activities in videos. This is the main theme of our project - activity detection in videos.
Approach: We focussed on a very simple framework for activity detection which models an activity in terms of transitions between certain predefined regions of interest. We used Haar Detector (based on Viola-Jones algorithm) for person detection in videos. To recognize a detected person, we used color histogram of the cloth he is wearing and applied Bhattacharya Metric for clustering. Finally, we built an extendible hierarchical finite state machine structure to detect individual, group and global activities in a multi-camera setting.
Achievement: I was awarded Summer Undergraduate Research Award for this project.
Automated Grading System (Machine Learning)
This project was done as part of the Machine Learning course. We took the project idea from Kaggle.com. Basically, this project aims to build a machine learning system for automated scoring of essays written by students. We took all data for training and testing from kaggle.
We built a linear regression model with polynomial basis function to predict the score of a given assay. We used features such as word count, sentence count, verb count, noun count, adjective count, adverb count (used NTLK library in python), number of spelling mistakes (used enchant library in python) etc. Probably the most important feature that we used was domain information content which tries to capture the semantics and information content of an essay.
Optimal File Distribution in a P2P Network (Approximation Algorithms)
This project was done under the tutelage of Prof. Naveen Garg. We worked on the problem of distributing a file initially located at a server among a set of n peers. This problem is a simplified version of general peer-to-peer file sharing problem.
The figure shows the connectivity among the peers. There is a source peer which initially has the file and there are n other peers. Each of the peers have an upload and a download capacity. We assume there is a gateway in between that has infinite capacity. The aim is to distribute the file in minimum amount of time.
We worked on the symmetric heterogenous case i.e. when upload and download capacities are same but they may be different among the peers. We came up with a 5-factor greedy algorithm and proved its correctness.