Computer Architecture (ELL782)

General Information

No one shall be permitted to audit the course. People are welcome to sit through it, however. The course is open to all suitably inclined Masters and Doctoral students.

Credits: 3 (LTP: 3-0-0) [Slot A]

Schedule for Classes:

Monday	08:00 - 09:30	IIA-201 (Bharti)
Thursday	08:00 - 09:30	IIA-201 (Bharti)

Schedule for Examinations:

Minor 1: 27 Aug (Tue), LH-310, 08:00am-09:00am
Minor 2: 27 Sep (Fri), LH-310, 08:00am-09:00am

Teaching Assistants:

Devyani Agarwal
Asmita Nandkumar Patil

Books, Papers and other Documentation

Textbook:

J. L. Hennessey, D. A. Patterson. Computer Architecture: A Quantitative Approach. Fifth Edition, Elsevier, 2012.

Reference Books:

D. A. Patterson, J. L. Hennessey. Computer Organization and Design: The Hardware/Software Interface. Third Edition, Elsevier, 2005.
J. P. Hayes. Computer Architecture and Organization. Third Edition, WCB/McGraw-Hill, 1998.
K. Hwang, F. A. Briggs. Computer Architecture and Parallel Processing. McGraw-Hill, 1985.
M. M. Mano. Computer System Architecture. Third Edition, PHI, 1993.
P. Kogge. Architecture of Pipelined Computers. McGraw-Hill, 1977.
W. Stallings. Computer Organization and Architecture. Macmillan Publishing Company, 1986.
W. Stallings. Computer Organization Architecture: Designing for Performance. Sixth Edition, Pearson Education, 2003.
K. Hwang. Advanced Computer Architecture: Parallelism, Scalability, Programmability. McGraw-Hill, 1993.
J. D. Carpinelli. Computer Systems Organization & Architecture. Pearson Education Asia, 2001.
W. Stallings. Reduced Instruction Set Computers. Second Edition, IEEE Computer Society Press, 1990.
B. Govindarajalu. Computer Architecture and Organization: Design Principles and Applications. Tata McGraw-Hill, 2004.
H. S. Stone. (Ed.) Introduction to Computer Architecture. Second Edition, Galgotia Publications Pvt. Ltd., 1990.
A. Grama, A. Gupta, G. Karypis, V. Kumar. Introduction to Parallel Computing. (Second Edition), Addison-Wesley, 2003.
D. P. Bertsekas, J. N. Tsitsiklis. Parallel and Distributed Computation. Prentice-Hall International, Inc., 1989.
C. Hamacher, Z. Vranesic, S. Zaky. Computer Organization. Fifth Edition, McGraw-Hill, 2002.
A. S. Tanenbaum. Structured Computer Organization. Fourth Edition, Pearson Education, 1999.

Papers:

D. A. Patterson. Reduced Instruction Set Computers. Communications of the ACM, vol. 28, no. 1, January 1985, pp. 8-21.

[Internal Link: IIT Delhi]

The above list is (obviously!) not exhaustive. Other reference material will be announced in the class. The Web has a vast storehouse of tutorial material on Computer Architecture, and other related areas.

Lecture Schedule, Links to Material (where applicable)

Please see the link to the I Sem (Autumn) 2017-2018 offering of this course, for an idea of the approximate structure of the course.

S.No.	Topics	Lectures	Instructor	References/Notes
1	Introduction: Parallel Computations and Architectures	01-03	SDR	[A. Grama, A. Gupta, G. Karypis, V. Kumar] Chap 2, 4 (the relevant portions, related to what was covered in class)
	Microprocessors, Microcontrollers, Models of Computers: the von Neumann and Harvard models, multi-tasking, time-sharing, multiprogramming. Processes and threads (a bare-bones introduction)	22 Jul (Mon) {lecture#01}	SDR
	Store-and-forward routing, linear rings, wrap-around meshes, hypercubes. (Cut-through routing is not in course). Ideal time complexity, and pseudo-code for the non-ideal case (processors not equally powerful, links not with the same bandwidth). Basic Communication operations: One-to-all broadcast on a linear ring. Ideal time complexity.	25 Jul (Thu) {lecture#02}	SDR
	Basic Communication operations: One-to-all broadcast on a linear ring. pseudo-code for the non-ideal case. The case of a wrap-around mesh: the ideal case. The case of a hypercube. The ideal case, and pseudo-code for the non-ideal case. Introduction to Flynn's classification of computers: SISD, SIMD, MIMD.	29 Jul (Mon) {lecture#03}	SDR
2	Theory of Pipelining	04-06	SDR	[P. Kogge]
	Introduction to linear pipelines (data, instruction). Difference between static and dynamic pipelines. (Only static pipelines will be in the course). Linear pipeline metrics. The basic pipeline scheduling problem. Greedy and non-greedy approaches to pipeline scheduling. Examples with reservation tables A and B (as in the book). The depressing worst-case bound: exponential.	01 Aug (Thu) {lecture#04}	SDR
	The MAL Lemma (Lemma 3-1), and its (easy) proof. The need for a better (more compact) representation, than a reservation table: Collision Vectors. State diagrams (modified state diagrams, actually). FSM modelling of the pipeline scheduling problem.	05 Aug (Mon) {lecture#05}	SDR	[Early start: 07:30am] (to make up for the 19 Sep (Thu) scheduled class to be missed)
	Lemma 3-2: finding simple cycles suffices. Lemma 3-3: the upper bound. Proof of both. The combined lemmas, and the best and worst case scenarios.	08 Aug (Thu) {lecture#06}	SDR	[Early start: 07:30am] (to make up for the 19 Sep (Thu) scheduled class to be missed)
3	RISC Pipelining	06-07	SDR	[Patterson's paper]
	RISC Pipelining: history. The trend towards CISC. Factors leading to RISC.	08 Aug (Thu) {lecture#06} (contd.)	SDR
	RISC pipeline hazards: structural, data and control. The Delayed Branch and Optimal Delayed Branch.	19 Aug (Mon) {lecture#07}	SDR	[Early start: 07:30am] (to make up for the 19 Sep (Thu) scheduled class to be missed)
4	RISC Pipelining: Deeper Issues	07-11	SDR	[Hennessy-Patterson]
	Introduction to software issues for RISC pipelines. The role of compilers and assemblers, and their optimisations.	22 Aug (Thu) {lecture#08}	SDR	[Early start: 07:30am] (to make up for the 10 Oct (Thu) scheduled class to be missed)
	Compiler and Assembler optimisations (contd). Details on dependences. Introduction to other pipeline scheduling (static) and loop unrolling.	23 Aug (Fri) {lecture#09}	SDR	[Early start: 07:30am] (to make up for the 10 Oct (Thu) scheduled class to be missed)
---	Minor 1	27 Aug (Tue)	---	---
	Loop unrolling and pipeline scheduling: in isolation, and in combination. Example, issues. Concluding RISC Pipelining: A mere mention of branch predictors.	29 Aug (Thu) {lecture#10}	SDR	[Early start: 07:30am] (to make up for the 10 Oct (Thu) scheduled class to be missed)
5	Caches	11-15	SDR	[Hennessy-Patterson] Cache Basics: Appendix B: Review of Memory Hierarchy. Sections B.1 and B.2 Multiprocessor Cache Coherence: Chapter 5: Thread-Level Parallelism. Sections 5.1 and 5.2 Some lecture notes [Internal links]: [a], [b], [c], [d], [e], [f] (IITD only)
	---	02 Sep (Mon) {lecture#xx}	---	(No Class)
	Caches: reiteration of the von Neumann and Harvard Architectures. Caches for general purpose processors and DSPs. Cache blocks and memory blocks, and address spaces. The four basic cache questions. Cache organisation: direct mapped, fully associative and set associative. Cache block replacement.	05 Sep (Thu) {lecture#11}	SDR	[Early start: 07:30am] (to make up for the 02 Sep (Mon) scheduled class to be missed)
	Write Back and Write Through notes. Write Allocate and No Write Allocate strategies.	09 Sep (Mon) {lecture#12}	SDR	[Early start: 07:30am] (to make up for the 02 Sep (Mon) scheduled class to be missed)
	Two Cache numericals. 1. The two-cache representative numerical. Calcualting the average miss rate and average mean access time	12 Sep (Thu) {lecture#13}	SDR	[Early start: 07:30am] (to make up for the 02 Sep (Mon) scheduled class to be missed)
	Multiprocessor Cache Coherence. Directory Protocols, and Symmptric Multi-Processor system protocols (the second is of practical importance, in multi-core processors). The Write Invalidate Protocol (the one used in most multi-core processors).	16 Sep (Mon) {lecture#14}	SDR	[Early start: 07:30am] (to make up for the 23 Sep (Mon) scheduled class to be missed)
	---	19 Sep (Thu) {lecture#xx}	---	(No Class)
	---	23 Sep (Mon) {lecture#xx}	---	(No Class)
---	Minor 2	27 Sep (Fri)	---	---
	The Write Invalidate Protocol (contd).	30 Sep (Mon) {lecture#15}	SDR	[Early start: 07:30am] (to make up for the 23 Sep (Mon) scheduled class to be missed)
	---	10 Oct (Thu) {lecture#xx}	---	(No Class)
6	Vector Processors, SIMD Multimedia Extensions, GPU Architectures	16-22	SDR	[Hennessy-Patterson] Chapter 4: Data-level Parallelism in Vector, SIMD & GPU Architectures.
	The three basic types of SIMD architectures: 1. Vector Processors (Cray) 2. SIMD Multimedia Instruction Set Extensions (Intel) 3. GPUs (NVIDIA) `DAXPY` as a common basic operation. Chaining, Lanes, Convoy, Chime. 8 important vector processor/SIMD questions/concepts.	14 Oct (Mon) {lecture#16}	SDR	[Early start: 07:30am] (to make up for the 23 Sep (Mon) scheduled class to be missed)
	SIMD Concept 1: Lanes for > 1 element per cycle SIMD Concept 2: Vector length registers, handling `non-64' loops SIMD Concept 3: Conditionals (`IF') inside code to be vectorised	17 Oct (Thu) {lecture#17}	SDR
	SIMD Concept 4: Memory banks For bandwidth to vector load/store units.	21 Oct (Mon) {lecture#18}	SDR
	SIMD Concept 5: Multiple Dimensional Matrices Matrix Multiplication. A cursorial mention of SIMD Concepts 6 and 7: Gather-Scatter (Handling Sparse Matrices) and Compilers for Vector Architectures	24 Oct (Thu) {lecture#19}	SDR
	II. SIMD Instruction Set Extensions for Multimedia 3 main omissions compared to Vector Processors (Cray). History and generations. Why have they been successful in spite of some obvious limitations?	31 Oct (Thu) {lecture#20}	SDR
	III. GPUs (NVIDIA) An example with a `grid', a thread block, and individual SIMD threads. Memory at different levels.	04 Nov (Mon) {lecture#21}	SDR
	III. GPUs (NVIDIA) (contd.) CUDA coding, closing notes.	07 Nov (Thu) {lecture#22}	SDR
7	Warehouse-Scale Computers	23-23	SDR	[Hennessy-Patterson] Chapter 5
	Warehouse-Scale Computers: Clouds: Amazon, Google, Microsoft	11 Nov (Mon) {lecture#23}	SDR
8	Towards Basic Dynamic Scheduling: Scoreboarding, Tomasulo's Algorithm	23-23	SDR	[Hennessy-Patterson] Appendix C
	Introducing a basic 5-stage pipeline, and their purpose	11 Nov (Mon) {lecture#24}	SDR
	---	14 Nov (Thu) {lecture#xx}	---	(No Class)

Mini Project

... A combination of theoretical work as well as programming work.
Both will be scrutinized in detail for original work and thoroughness.
For the programming part, there will be credit for good coding.
Sphagetti coding will be penalized.
Program correctness or good programming alone will not fetch you full credit ... also required are results of extensive experimentation with varying various program parameters, and explaining the results thus obtained.
The Mini Project will have to be submitted on or before the due date and time.
Late submissions will not be considered at all.
Unfair means will be result in assigning as marks, the number said to have been discovered by the ancient Indians, to both parties (un)concerned. Mini Project

Examinations and Grading Information

The marks distribution is as follows (out of a total of 100):

Minor I	28
Minor II	28
Mini Project	16
Major	28
Grand Total	100

ELL782 Evaluation: Mini Project Groups, Examination-related Information [Internal Link: IIT Delhi]

Attendance Requirements:

As per Institute rules.
Illness policy: illness to be certified by the IITD Hospital
Attendance in Examinations is Compulsory.

ELL782 Attendance Records (on the moodle page)
[No absent above the permitted maximum 25% (7 absents)]

Course Feedback

Link to Course Feedback Form

Sumantra Dutta Roy, Department of Electrical Engineering, IIT Delhi, Hauz Khas,

New Delhi - 110 016, INDIA. sumantra@ee.iitd.ac.in