S.No.
|
Topics
|
Lectures
|
Instructor
|
References/Notes
|
1
|
Introduction:
Parallel Computations and Architectures
|
01-03
|
SDR
|
[A. Grama, A. Gupta, G. Karypis, V. Kumar]
Chap 2, 4 (the relevant portions, related to what was covered in class)
|
|
Microprocessors, Microcontrollers, Models of Computers: the von
Neumann and Harvard models, multi-tasking, time-sharing,
multiprogramming. Processes and threads (a bare-bones introduction)
|
22 Jul (Mon) {lecture#01}
|
SDR
|
|
|
Store-and-forward routing, linear rings, wrap-around meshes,
hypercubes. (Cut-through routing is not in course). Ideal time
complexity, and pseudo-code for the non-ideal case (processors
not equally powerful, links not with the same bandwidth).
Basic Communication operations: One-to-all broadcast on a linear ring.
Ideal time complexity.
|
25 Jul (Thu) {lecture#02}
|
SDR
|
|
|
Basic Communication operations: One-to-all broadcast on a linear ring.
pseudo-code for the non-ideal case. The case of a wrap-around
mesh: the ideal case. The case of a hypercube. The ideal case,
and pseudo-code for the non-ideal case.
Introduction to Flynn's classification of computers:
SISD, SIMD, MIMD.
|
29 Jul (Mon) {lecture#03}
|
SDR
|
|
2
|
Theory of Pipelining
|
04-06
|
SDR
|
[P. Kogge]
|
|
Introduction to linear pipelines (data, instruction). Difference
between static and dynamic pipelines. (Only static pipelines will
be in the course). Linear pipeline metrics. The basic pipeline
scheduling problem.
Greedy and non-greedy approaches to pipeline scheduling.
Examples with reservation tables A and B (as in the book).
The depressing worst-case bound: exponential.
|
01 Aug (Thu) {lecture#04}
|
SDR
|
|
|
The MAL Lemma (Lemma 3-1), and its (easy) proof.
The need for a better (more compact) representation, than a reservation table:
Collision Vectors. State diagrams (modified state diagrams, actually).
FSM modelling of the pipeline scheduling problem.
|
05 Aug (Mon) {lecture#05}
|
SDR
|
[Early start: 07:30am]
(to make up for the 19 Sep (Thu) scheduled class to be missed)
|
|
Lemma 3-2: finding simple cycles suffices. Lemma 3-3: the upper
bound. Proof of both. The combined lemmas, and the best and worst
case scenarios.
|
08 Aug (Thu) {lecture#06}
|
SDR
|
[Early start: 07:30am]
(to make up for the 19 Sep (Thu) scheduled class to be missed)
|
3
|
RISC Pipelining
|
06-07
|
SDR
|
[Patterson's paper]
|
|
RISC Pipelining: history. The trend towards CISC. Factors leading to RISC.
|
08 Aug (Thu) {lecture#06} (contd.)
|
SDR
|
|
|
RISC pipeline hazards: structural, data and control.
The Delayed Branch and Optimal Delayed Branch.
|
19 Aug (Mon) {lecture#07}
|
SDR
|
[Early start: 07:30am]
(to make up for the 19 Sep (Thu) scheduled class to be missed)
|
4
|
RISC Pipelining: Deeper Issues
|
07-11
|
SDR
|
[Hennessy-Patterson]
|
|
Introduction to software issues for RISC pipelines. The role of
compilers and assemblers, and their optimisations.
|
22 Aug (Thu) {lecture#08}
|
SDR
|
[Early start: 07:30am]
(to make up for the 10 Oct (Thu) scheduled class to be missed)
|
|
Compiler and Assembler optimisations (contd). Details on
dependences. Introduction to other pipeline scheduling (static)
and loop unrolling.
|
23 Aug (Fri) {lecture#09}
|
SDR
|
[Early start: 07:30am]
(to make up for the 10 Oct (Thu) scheduled class to be missed)
|
---
|
Minor 1
|
27 Aug (Tue)
|
---
|
---
|
|
Loop unrolling and pipeline scheduling: in isolation, and in
combination. Example, issues.
Concluding RISC Pipelining: A mere mention of branch
predictors.
|
29 Aug (Thu) {lecture#10}
|
SDR
|
[Early start: 07:30am]
(to make up for the 10 Oct (Thu) scheduled class to be missed)
|
5
|
Caches
|
11-15
|
SDR
|
[Hennessy-Patterson]
Cache Basics: Appendix B: Review of Memory Hierarchy.
Sections B.1 and B.2
Multiprocessor Cache Coherence: Chapter 5: Thread-Level
Parallelism.
Sections 5.1 and 5.2
Some lecture notes [Internal links]:
[a],
[b],
[c],
[d],
[e],
[f]
(IITD only)
|
|
---
|
02 Sep (Mon) {lecture#xx}
|
---
|
(No Class)
|
|
Caches: reiteration of the von Neumann and Harvard Architectures.
Caches for general purpose processors and DSPs. Cache blocks and
memory blocks, and address spaces. The four basic cache
questions. Cache organisation: direct mapped, fully associative
and set associative. Cache block replacement.
|
05 Sep (Thu) {lecture#11}
|
SDR
|
[Early start: 07:30am]
(to make up for the 02 Sep (Mon) scheduled class to be missed)
|
|
Write Back and Write Through notes. Write Allocate and No Write
Allocate strategies.
|
09 Sep (Mon) {lecture#12}
|
SDR
|
[Early start: 07:30am]
(to make up for the 02 Sep (Mon) scheduled class to be missed)
|
|
Two Cache numericals.
1. The two-cache representative numerical. Calcualting the
average miss rate and average mean access time
|
12 Sep (Thu) {lecture#13}
|
SDR
|
[Early start: 07:30am]
(to make up for the 02 Sep (Mon) scheduled class to be missed)
|
|
Multiprocessor Cache Coherence. Directory Protocols, and
Symmptric Multi-Processor system protocols (the second is of
practical importance, in multi-core processors). The Write
Invalidate Protocol (the one used in most multi-core processors).
|
16 Sep (Mon) {lecture#14}
|
SDR
|
[Early start: 07:30am]
(to make up for the 23 Sep (Mon) scheduled class to be missed)
|
|
---
|
19 Sep (Thu) {lecture#xx}
|
---
|
(No Class)
|
|
---
|
23 Sep (Mon) {lecture#xx}
|
---
|
(No Class)
|
---
|
Minor 2
|
27 Sep (Fri)
|
---
|
---
|
|
The Write Invalidate Protocol (contd).
|
30 Sep (Mon) {lecture#15}
|
SDR
|
[Early start: 07:30am]
(to make up for the 23 Sep (Mon) scheduled class to be missed)
|
|
---
|
10 Oct (Thu) {lecture#xx}
|
---
|
(No Class)
|
6
|
Vector Processors, SIMD Multimedia
Extensions, GPU Architectures
|
16-22
|
SDR
|
[Hennessy-Patterson]
Chapter 4: Data-level Parallelism in Vector, SIMD & GPU
Architectures.
|
|
The three basic types of SIMD architectures:
1. Vector Processors (Cray)
2. SIMD Multimedia Instruction Set Extensions (Intel)
3. GPUs (NVIDIA)
DAXPY as a common basic operation.
Chaining, Lanes, Convoy, Chime.
8 important vector processor/SIMD questions/concepts.
|
14 Oct (Mon) {lecture#16}
|
SDR
|
[Early start: 07:30am]
(to make up for the 23 Sep (Mon) scheduled class to be missed)
|
|
SIMD Concept 1: Lanes for > 1 element per cycle
SIMD Concept 2: Vector length registers, handling `non-64' loops
SIMD Concept 3: Conditionals (`IF') inside code to be vectorised
|
17 Oct (Thu) {lecture#17}
|
SDR
|
|
|
SIMD Concept 4: Memory banks
For bandwidth to vector load/store units.
|
21 Oct (Mon) {lecture#18}
|
SDR
|
|
|
SIMD Concept 5: Multiple Dimensional Matrices
Matrix Multiplication.
A cursorial mention of SIMD Concepts 6 and 7: Gather-Scatter
(Handling Sparse Matrices) and Compilers for Vector
Architectures
|
24 Oct (Thu) {lecture#19}
|
SDR
|
|
|
II. SIMD Instruction Set Extensions for Multimedia
3 main omissions compared to Vector Processors (Cray).
History and generations.
Why have they been successful in spite of some obvious limitations?
|
31 Oct (Thu) {lecture#20}
|
SDR
|
|
|
III. GPUs (NVIDIA)
An example with a `grid', a thread block, and individual SIMD threads.
Memory at different levels.
|
04 Nov (Mon) {lecture#21}
|
SDR
|
|
|
III. GPUs (NVIDIA) (contd.)
CUDA coding, closing notes.
|
07 Nov (Thu) {lecture#22}
|
SDR
|
|
7
|
Warehouse-Scale Computers
|
23-23
|
SDR
|
[Hennessy-Patterson] Chapter 5
|
|
Warehouse-Scale Computers: Clouds: Amazon, Google, Microsoft
|
11 Nov (Mon) {lecture#23}
|
SDR
|
|
8
|
Towards Basic Dynamic Scheduling: Scoreboarding, Tomasulo's Algorithm
|
23-23
|
SDR
|
[Hennessy-Patterson] Appendix C
|
|
Introducing a basic 5-stage pipeline, and their purpose
|
11 Nov (Mon) {lecture#24}
|
SDR
|
|
|
---
|
14 Nov (Thu) {lecture#xx}
|
---
|
(No Class)
|