Deep Learning Inferences on Embedded Platforms

Slot: AC Tue and Fri 2-3:30 pm

Location: Bharti 106

Tentative evaluation plan: 50% course project, 50% end sem exam

[1] Background

If all mobiles are connected to the Internet, why not run all heavy computations on the cloud anyway? What is the point of running deep learning inferences on mobile or embedded platforms? We will discuss some motivating examples, where network connectivity price, latency or energy might make local computations on mobile devices more useful compared to remote execution on the cloud. We will also spend some time to understand what exactly is run on the mobile in typical usage scenarios (inference tasks using pre-trained models), and the deepnet layer details for some such typical computations.

[2] Metrics and trade-offs

While accuracy of the inference task is an important metric to maximize, this might have trade-offs with other metrics on resource constrained embedded platforms. Is the latency of each inference too high to suit a real time mobile application while a user is interacting with it, or to suit a road traffic application to detect/prevent accidents? Is the trained deep-net model used in the inference too large to fit the embedded platform RAM? Does the inference task drain the mobile battery too fast? We will discuss such metrics like accuracy, latency, memory and power requirements, and the trade-offs among them. The goal of the course is to see how different research communities are innovating to better handle these trade-offs. [tradeoff] [deepiot]

[3] Architecture

If proliferation of extensive datasets enabled deepnets to learn from examples, hardware advances like GPU have been an equally important enabling factor. We will discuss how computer architecture researchers are devising new architectural designs for embedded deepnets. This changes the hardware platform on which the inference tasks execute. [eyeriss][eie] [scnn] [survey]

[4] Systems

Mobile systems researchers create a software interface between the architecture researchers who design the actual hardware on which inference tasks are run, and the ML researchers who design what computations each inference task would need. We will discuss traditional systems optimization techniques like scheduling (e.g. (i) pipelining different inference tasks to reduce latency, (ii) spread computations across CPU-GPU and other co-processors on the mobile platform and (iii) use cloud computing when network is available), caching (e.g. store reusable results to reduce computations) etc. in the context of embedded deepnets. [deepmon] [deepeye] [deepx] [mcdnn] [leo]

[5] ML

Machine learning researchers design the actual computations needed in an inference task. There are significant efforts to meet embedded systems constraints of energy, latency, processing power and RAM size, while maintaining a reasonable inference accuracy. We will discuss some of these methods like alternate network architectures, sparsification, compression, quantization and pruning to run standard complex inference tasks on embedded platforms. [deeprebirth] [shufflenet] [mobilenets] [sparsification] [quantized] [compression] [pruning]
Topic Dates #Lectures Slides
Course motivation and overview Jan 2 1 [jan2]
Deep learning inference background Jan 5 - Jan 16 4 [jan5],[jan9],[jan12],[jan16]
Metrics and trade-offs Jan 19 1 [jan19]
Architecture Jan 23 - Jan 30, Feb 13 - Feb 16 4 [jan23], [jan30],[feb13]
Systems Feb 19 - Feb 23 2
ML Mar 5 - Mar 19, Mar 30 - Apr 20 12
Course summary May 1 1

Course project

An Android application needs to be implemented using two existing deep learning frameworks for mobiles, one that uses CPU and another GPU. Inference accuracy, latency, energy, memory need to be compared between the implementations.

Possible mobile deep learning frameworks:

Topic Dates Reports and marks
Run frameworks with default applications on phones (Minor 1) Feb 9 minor1
Discuss project related issues Mar 23, Apr 24
Have the measurement setup ready, plot some metrics and trade-off graphs (Minor 2) Mar 26
Final projects due Apr 27