Projects

Research Projects

Performance and Power Prediction for Concurrent Execution on GPUs
This is an extension of the work published in ISPASS 2020. In this work, we develop a performance and power predictor for concurrent execution on GPUs. We do not rely on cumbersome GPU profiling and instead rely on only the statistics of standalone applications on CPUs. In addition, we use the performance and power of the standalone applications on the GPUs as one of the features. To quantify the contention at the shared resources, we use a fairness metric. We do not calculate the fairness metric for every possible bag of applications, instead we rely on three fairness values for each application obtained with respect to three representative microbenchmarks. One fairness value is chosen at runtime depending on the memory behavior of the other applications that run concurrently. This work is accepted to be published in the proceedings of ACM Transactions on Architecture and Code Optimization (TACO), 2022.

PredStereo: An Accurate Real-time Stereo Vision System
In this work, we characterize the accuracy of both traditional and CNN-based stereo vision algorithms and show that the tilt towards the CNN-based stereo algorithms is unjustified. Especially for a self-driving scenario where the vehicle is expected to be highly accurate, we cannot rely on just one class of algorithms. We show that an ensemble-based system that chooses between these algorithms at runtime is the need of the hour. In this work, we develop such a system that operates in real-time. This work is published in the proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV), 2022.

Game Theory-based Parameter-Tuning for Path Planning of UAVs
In this work, we solve the problem of automatic parameter tuning for path planning of UAVs. Traditionally, this problem has been solved using optimization-based approaches. We propose to learn the constraints of the optimization problem using a multi-layer perceptron model. However, such approaches do not converge if the search space is large and the problem is non-linear. Thus, we propose to convert the optimization problem to a game theory problem, which is significantly faster and can be used at runtime. This work is published in the proceedings of IEEE International Conference on VLSI Design 2021.

Accelerating CNN Inference on ASICs: A Survey
As you might have noticed, the field of deep learning is growing very rapidly and these networks are getting computationally intensive with each passing second !! With the end of Dennard scaling and Moore's law, not much can be done in the processor space. Thus, custom hardware architectures are emerging to the rescue. We have written a survey paper on accelerating the inference phase of the convolutional neural networks on the custom hardware. The survey is quite exhaustive and covers all the kinds of optimizations to accelerate these networks. This work is published in Journal of Systems Architecture (JSA) Vol. 113, Feb. 2021.

VisSched: An Auction based Scheduler for Vision Workloads on Heterogeneous Processors
In this work, we start by characterizing the vision workloads and identifying their unique phase behavior. Subsequently, we develop an auction-theoretic scheduling scheme for such workloads on a multicore architecture. The scheme exploits the phase behavior of the vision workloads to simplify the scheduling decisions. This research work is comprehensive in the sense that we first characterize the computer vision applications on server-class processors, develop an auction-theoretic scheduling mechanism that is both starvation-free, and provides theoretical optimality guarantees for the corresponding schedules. This work was presented as a poster at DAC 2020, as a full paper at ESWEEK CASES 2020, and is published in IEEE TCAD Vol. 39 Issue 11.

Performance Prediction for Multi-Application Concurrency on GPUs
In this work, we developed a decision tree-based performance predictor for multi-application concurrency on GPUs. We clearly establish that GPUs are not the right choice when it comes to scheduling multiple applications concurrently. The predictor is able to provide an estimate of how the performance falls with the increasing number of applications. It relies on the CPU execution statistics and the fairness of schedule of the bag-of-tasks on multicores. This work is published in IEEE ISPASS 2020.

Super Resolution on Reconfigurable Arrays
In this work, we implemented a convolutional neural network on a Virtex-6 FPGA to convert an SD video to an HD video in real time. The optimizations included computation time reduction by exploiting inherent parallelism, efficient matrix multiplication by using Toeplitz representation and memory footprint reduction by using a rotating buffer instead of a full-size input buffer. We designed a self-defined protocol that exploited the OCP protocol to fetch image data in parallel from four DDR3 banks.

FPGA Cluster based parallel architecture for Cryptanalysis
In this work, we developed a high-speed communication network, consisting of four Virtex-6 FPGAs that communicated via the MGT protocol. In addition, we developed python and C wrappers for the PCIe bus drivers to implement DMA from Linux Kernel memory to FPGA BRAM space. This work was presented at National Workshop on Cryptology 2014.