Evaluation plan: Assignments 50%, Minor1 20%, Major 30%

Piazza link: col730

Text Books

Course Content


Topic Slides Supplementary Material
Introduction course_content_aug4.pdf Pacheco, GGKK Chapter 1
[Performance] The only reason for parallelism Quanitifying performance improvements analytically amdahl_gustafson_aug4.pdf amdahl_gustafson_alternateslides.pdf, amdahl_gustafson_problems.pdf, Pacheco Chapter 2.6, GGKK Chapter 5
[NVIDIA GPUs and CUDA] GPU architecture, CUDA programming and optimizations i. Latency vs. throughput, hardware parts of a GPU (SMs, memory hierarchy), programming model (thread, blocks, grids, allocating memory and copying between DRAM and GPU memory, kernel launch) cudabasics_aug8.pdf
Small sample programs helloworld.cu, cudamem.cu
ii. CUDA optimizations CUDAoptimization1_aug18.pdf, CUDAoptimization2_aug22.pdf, parallelHistogram_aug22.pdf
iii. CUDA streams cudastreams_aug25_aug29.pdf parallelconvolution_aug29.pdf
Sample stream program simplestream.cu, common.h
Kirik and Hwu, nvidia1, nvidia2, nvidia3
[Multicore CPUs] Pthreads and OpenMP i. Pthread and OpenMP programs for basic hello world messages, computing PI where the final sum increment forms a critical section (protected with busy waiting and mutex) and message passing from source to destination thread where destination thread waits using semaphore. OpenMP compilation "gcc -g -Wall -fopenmp -o omp_hello omp_hello.c" and pthread compilation "gcc -g -Wall -o pth_msgpassing pth_msgpassing.c -lpthread". pth_hello.c, omp_hello.c, pth_pi.c, pth_msgpassing.c, pthread_openmp_sep8.pdf
ii. Caches for faster memory access, cache coherence necesssary for correctness in multi-threaded programs on multi-core CPUs, false sharing of cache line among threads affect performance, can be handled with padding or thread private data strutures. caches_sep12.pdf, mesi_sep12.pdf, falsesharing_sep14.pdf
iii. How locks and barriers are implemented by operating systems, using hardawre instruction primitives. hwsynchronizationprimitives_sep19.pdf
iv. How processor cores are connected to each other or to memory, inter-connection networks. interconnection_topology_sep14.pdf
v. Examples of concurrency bugs from real world softwares. concurrency-bugs-asplospaper.pdf concurrencybugs_sep22.pdf
Pacheco Chapters 4 and 5
[Parallel programming on distributed memory] MPI, Mapreduce i. Point to point and group communication with MPI mpi_sep29.pdf
ii. MPI standard, which the various libraries like OpenMPI, MPICH, IntelMPI implement mpi40-standard_oct6.pdf
iii. How companies like Uber and Baidu are using MPI for large scale computations uber_horovod_oct6.pdf, horovod-mpi-oct6.pdf, baidu_allreduce_oct6.pdf
iv. Map-reduce programming model mapreduce_rutgers_oct10.pdf, mapreducepaper_oct10.pdf
Pacheco Chapter 3
[How to model parallel programs] i. Parallel Random Access Memory (PRAM model), design and time complexity analysis of PRAM algorithms prammodel_oct13.pdf
ii. Foster design methodology for parallel programs: partition, communication, agglomeration, mapping foster_oct18.pdf
iii. Bulk Synchoronous Parallel (BSP) model, comprising some global supersteps, each with local computation, communication and synchronization BSP wikipedia, Google Pregel and Apache Hama follow the BSP model of parallel programming.
iv. Work Depth model, Brent's Theorem on upper bound brent_nov3.pdf
Quinn Chapter 3, GGKK Chapter 3, Kumar Chapter 3.1-3.2 kumar chapter 3.pdf
[Parallel algorithmic techniques] i. Pointer jumping or recursive doubling pointerjumping-oct20.pdf
ii. Euler tour technique eulertour_oct27.pdf
iii. Tree contraction treecontraction_oct31.pdf
iv. Divide and conquer (count starting ones, polynomial evaluation) divide_conquer_nov3.pdf
v. Divide and conquer (mergesort, quicksort) sequentialsorting_nov7.pdf, parallelsorting_nov7.pdf
Kumar Chapter 7 kumar chapter 7.pdf
[Concurrency in other programming languages] Java and Python javamemorymodel_pythongil_nov10.pdf

Assignments

We will strictly check for plagiarism against code available online or within your own class. Offenders will get a fail grade. Submit all assignments on Moodle. You can take a total grace of 7 days for the three assignments. Use them wisely.