Evaluation plan: Assignments 50%, Midterm 20%, Major 30%,
Audit criteria: 30% in exams, 30% in assignments
Attendance policy: Attendance not mandatory does not mean not coming to class is mandatory.

Piazza link: col380

Text Books

Course Content


,
Topic Slides Supplementary Material
A. [Introduction] course_content.pdf Pacheco, GGKK Chapter 1
B. [Performance] The only reason for parallelism Quanitifying performance improvements analytically performance1.pdf, performance2.pdf, karp_flatt_original.pdf
The issues with Python and parallelism python.pdf
Pacheco Chapter 2.6, GGKK Chapter 5
C. [Parallel programming on Multithreaded CPU with shared memory] 1. Cache basics, cache coherence protocol for multi-core CPUs, parallel performance drop due to false sharing from cache coherence cache_basic.pdf, cache_coherence_MESI.pdf, parallel_performance_falsesharing.pdf,
2. Pthread and OpenMP programming sample_programs.zip, pthread_openmp.pdf,
3. Concurrency bugs concurrency_bugs_paper.pdf, concurrency_bugs_slides.pdf
4. Hardware synchronization primitives hardware_synchronization_primitives.pdf
Pacheco Chapters 4 and 5
D. [Programming NVIDIA GPUs with CUDA] 1. CUDA overview slides cuda_overview.pdf, small sample programs helloworld.cu, cudamem.cu,
2. Example parallelization of applications (a) maintaining correctness with atomic add, (b) improving performance with memory coalescing and (c) imprving performance using copies of histogram bins in shared memory across threads of the same block. parallel_histogram.pdf, parallel_saxpy.pdf
3. Overlapping kernels, memory operations with CUDA streams cudastreams.pdf(till slide 26 in mid-term)
CUDA programming guide sections 1.1, 1.2, 1.3, 2.1, 2.2 (excluding 2.2.1), 2.3, 2.4, 3.2.2, 3.2.4, 3.2.8.5, 4.1
E. [Parallel programming on distributed memory system using Message Pssing Interface (MPI)] Some example programs, the books have a more comrehensive list.
1. Basic program mpi_hello.c
2. Point-to-point communication mpi_send-standard.c, mpi_send-standard-large.c, mpi_send-nonblocking-wait.c, mpi_send-nonblocking-waitany.c, mpi_send-nonblocking-waitall.c
3. Collective communication mpi_scatter.c, mpi_scatterv.c, mpi_gather.c, mpi_random_sum.c
Compile with mpicc -g -Wall -o mpi_hello.o mpi_hello.c
Run with mpirun -np 4 mpi_hello.o
4. Interconnection networks GGKK Chapter 2 (specifically 2.4.2, 2.4.3, 2.4.4, 2.4.5), nd-mesh.pdf
5. Efficient algorithms for collective communications GGKK Chapter 4 (upto Section 4.4)
1. Pacheco Chapters 3, code samples from ch3 pacheco_book_code
2. The Art of HPC.pdf Chapters 3 and 4 (focus on the C APIs, not Pythn and other bindings)
F. Parallel algorithm design 1. Parallelizing graph algorithms with OpenMP and MPI BFS.pdf, PRIM_Dijkstra.pdf GGKK (10.1, 10.2, 10.3, 10.7.2),
2. PRAM model PRAM.pdf
3. Paralllelization techniques
a. parallel reduction, reductions.pdf
b. pointer jumping, pointerjumping.pdf
c. Eulerian tours eulertour.pdf
GGKK Chapter 10, Pacheco Chapter 6

Assignments

We will strictly check for plagiarism against code available online or within your own class. Offenders will get a fail grade. Submit all assignments on Moodle. You can take a total grace of 7 days for the three assignments. Use them wisely.