Teaching

Evaluation plan: Assignments 50%, Midterm 20%, Major 30%,
Audit criteria: 30% in exams, 30% in assignments
Attendance policy: Attendance not mandatory does not mean not coming to class is mandatory.

Piazza link: col380

Text Books

Kirk and Hwu, Programming Massively Parallel Processors, Kirk and Hwu
Peter S. Pacheco, An introduction to Parallel Programming, Pacheco
Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar, Introduction to Parallel Computing, GGKK

Course Content

Topic	Slides	Supplementary Material
A. [Introduction]	course_content.pdf	Pacheco, GGKK Chapter 1
B. [Performance] The only reason for parallelism	Quanitifying performance improvements analytically performance1.pdf, performance2.pdf, karp_flatt_original.pdf The issues with Python and parallelism python.pdf	Pacheco Chapter 2.6, GGKK Chapter 5
C. [Parallel programming on Multithreaded CPU with shared memory]	1. Cache basics, cache coherence protocol for multi-core CPUs, parallel performance drop due to false sharing from cache coherence cache_basic.pdf, cache_coherence_MESI.pdf, parallel_performance_falsesharing.pdf, 2. Pthread and OpenMP programming sample_programs.zip, pthread_openmp.pdf, 3. Concurrency bugs concurrency_bugs_paper.pdf, concurrency_bugs_slides.pdf 4. Hardware synchronization primitives hardware_synchronization_primitives.pdf	Pacheco Chapters 4 and 5
D. [Programming NVIDIA GPUs with CUDA]	1. CUDA overview slides cuda_overview.pdf, small sample programs helloworld.cu, cudamem.cu, 2. Example parallelization of applications (a) maintaining correctness with atomic add, (b) improving performance with memory coalescing and (c) imprving performance using copies of histogram bins in shared memory across threads of the same block. parallel_histogram.pdf, parallel_saxpy.pdf 3. Overlapping kernels, memory operations with CUDA streams cudastreams.pdf(till slide 26 in mid-term)	CUDA programming guide sections 1.1, 1.2, 1.3, 2.1, 2.2 (excluding 2.2.1), 2.3, 2.4, 3.2.2, 3.2.4, 3.2.8.5, 4.1
E. [Parallel programming on distributed memory system using Message Pssing Interface (MPI)]	Some example programs, the books have a more comrehensive list. 1. Basic program mpi_hello.c 2. Point-to-point communication mpi_send-standard.c, mpi_send-standard-large.c, mpi_send-nonblocking-wait.c, mpi_send-nonblocking-waitany.c, mpi_send-nonblocking-waitall.c 3. Collective communication mpi_scatter.c, mpi_scatterv.c, mpi_gather.c, mpi_random_sum.c Compile with mpicc -g -Wall -o mpi_hello.o mpi_hello.c Run with mpirun -np 4 mpi_hello.o 4. Interconnection networks GGKK Chapter 2 (specifically 2.4.2, 2.4.3, 2.4.4, 2.4.5), nd-mesh.pdf 5. Efficient algorithms for collective communications GGKK Chapter 4 (upto Section 4.4)	1. Pacheco Chapters 3, code samples from ch3 pacheco_book_code 2. The Art of HPC.pdf Chapters 3 and 4 (focus on the C APIs, not Pythn and other bindings)
F. Parallel algorithm design	1. Parallelizing graph algorithms with OpenMP and MPI BFS.pdf, PRIM_Dijkstra.pdf GGKK (10.1, 10.2, 10.3, 10.7.2), 2. PRAM model PRAM.pdf 3. Paralllelization techniques a. parallel reduction, reductions.pdf b. pointer jumping, pointerjumping.pdf c. Eulerian tours eulertour.pdf	GGKK Chapter 10, Pacheco Chapter 6

Assignments

We will strictly check for plagiarism against code available online or within your own class. Offenders will get a fail grade. Submit all assignments on Moodle. You can take a total grace of 7 days for the three assignments. Use them wisely.

[10%] Matrix LU decomposition with Pthreads and OpenMP openmp_pthread_assignment.html: Due on Feb 15. You can start at openmp-helloworld.
[20%] An image processing library implemented with C++ and CUDA to recognize hand-written digits cuda_assignment.html: Due on Mar 31.
[20%] Generating and solving random maze with C++, MPI programs assignment3.pdf: Due on May 7 (no buffer days can be used, no extension, hard deadline).