COL 870: Special Topics in Maching Learning (Reinforcement Learning)

General Information

Instructor: Parag Singla (email: parags AT cse.iitd.ac.in)

Class Timings/Venue:

Slot AD. Time: Tue,Fri - 3:30 pm - 5:00 pm.
Venue: LHC 308.

Office Hours:

Piazza Sign-up: Click Here
(Code is as announced in class)

Teaching Assistants:
Vishal Sharma (vishal.sharma AT cse)
Lovish Madaan (lovish.madaan.cs515 AT cse)

Paper Presentation Schedule, Paper Assignment and Teams Details:
Access Here

Announcements

[July 22, 2019]: First Class!

Course Objective and Content

This course will cover basics of Reinforcement Learning including MDPs, Dynamic Programming, Monte Carlo Methods, TD Learning, On Policy and Off Policy Control. We will first look at the solution methods for tabular representation followed by dealing with function aproximation. We will also cover Policy Gradient and Actor-Critic Methods. Later part of the course will cover some Advanced Techniques for Deep Reinforcement Learning.

Week-wise Schedule

Week	Dates	Topic	Book Chapters	Supplementary Material
1	July 23, July 26	Introduction, Basics	Ch. 1 (S&B)	Atari Simulator
2	July 30, Aug 2	Markov Decision Processes (MDPs)	Ch. 3 (S&B)
3	Aug 6, Aug 9	Dynamic Programming	Ch. 4 (S&B)
4	Aug 13, Aug 16	Monte Carlo Methods (Prediction)	Ch. 5 (S&B)
5	Aug 20, -	TD Methods (Prediction)	Ch. 6 (S&B)
5	-, Aug 30	Monte Carlo, TD Method (Control)	Ch. 5, 6 (S&B)
6	Sep 3, Sep 7 (Sat)	Monte Carlo, TD Method (Control - cont.)	Ch. 5, 6 (S&B)
7	-, Sep 13	N-step TD, Eligibility Traces	Ch. 7, 12 (S&B)
7	Sep 17,Sep 20	Mid-term Exam, Model based RL	Ch. 8 (S&B)
8	Sep 24	Model Based RL	Ch. 8 (S&B)
8	Sep 28 (Saturday)	(Action-)Value Function Approximation	Ch. 9 (S&B)
9	Oct 1, -	(Action-)Value Function Approximation	Ch. 9, 10 (S&B)
9	-, Oct 11	(Action-)Value Function Approximation	Ch. 9, 10 (S&B)
10	Oct 15, Oct 18	Value Function Approximation, Policy Gradient	Ch. 10, 13 (S&B)
11	Oct 22, Oct 25	Policy Gradient, Misc. Topics	Ch. 13 (S&B), Ch. 10.3, Ch. 2 (S&B)
12,13	Oct 29, Oct 31,Nov 1	Student Presentations
13,14	Nov 5, Nov 6, Nov 8	Student Presentations

NOTE: (a) No Class on: Sep 6 (Fri). (b)Mid-term Exam: Sep 17. (c) Make-up Class: Sep 28 (during original minor exam time).

Class Notes (Date-Wise)
july23.pdf july26.pdf july30.pdf aug2.pdf aug6.pdf aug9.pdf aug13.pdf aug16.pdf aug20.pdf aug30.pdf sep3.pdf sep7.pdf sep13.pdf sep20.pdf sep24.pdf, sep 28 (class in SIT. No digital copy of notes), oct1.pdf oct11.pdf oct15.pdf oct18.pdf oct22.pdf oct25.pdf

References

Reinforcement Learning (Second Edition). Richard Sutton and Andrew Barto. MIT Press. 2018. Online Version
Algorithms for Reinforcement Learning. Csaba Szepesvari. Morgan and Claypool. 2010. Online Version

Background Reading

Markov Chains (Wikipedia): Click Here

Additional Resources

Course by David Silver: Click Here

Assignment Submission Instructions

You are free to discuss the problems with other students in the class. But the final solution/code that you produce should come through your individual efforts.
Required code should be submitted using Moodle Page.
Honor Code: Any cases of copying will be awarded a zero on the assignment. Additional penalities will be imposed based on the severity of copying. Any copying cases run the chances of being escalated to the Department/DISCO.
Late policy: You may be allowed a set of buffer days for each assignment, call it 'X' buffer days. There is no penalty if your submission stays withing the limit of the X buffer days. Any delay beyond X days will result in a substantial (20% for each late day in submission) penalty.

Assignments

Assignment 1 . Deadline: Tuesday, October 1 (+ 2 buffer days), Midnight. To be done individually.

Grading Policy (Tentative)

Assignments: 22.5%

Ass1: 7.5%
Ass2: 14% (was 15% earlier)

Paper Reviewing and Presentation: 16% (was 20% earlier)

Paper Reviewing: 6% (was 10% earlier)
Paper Presentation: 10%

Mid-term:22.5%

Major: 40% (was 35% earlier)