|
COL 870: Special Topics in Maching Learning (Reinforcement Learning)
General Information
Instructor: Parag Singla (email: parags AT cse.iitd.ac.in)
Class Timings/Venue:
- Slot AD. Time: Tue,Fri - 3:30 pm - 5:00 pm.
- Venue: LHC 308.
Office Hours:
Piazza Sign-up:
Click Here
(Code is as announced in class)
Teaching Assistants:
Vishal Sharma (vishal.sharma AT cse)
Lovish Madaan (lovish.madaan.cs515 AT cse)
Paper Presentation Schedule, Paper Assignment and Teams Details:
Access Here
Announcements
- [July 22, 2019]: First Class!
This course will cover basics of Reinforcement Learning including
MDPs, Dynamic Programming, Monte Carlo Methods, TD Learning,
On Policy and Off Policy Control. We will first look at the
solution methods for tabular representation followed by dealing
with function aproximation. We will also cover Policy
Gradient and Actor-Critic Methods. Later part of the course will
cover some Advanced Techniques for Deep Reinforcement Learning.
Week-wise Schedule
Week |
Dates |
Topic |
Book Chapters |
Supplementary Material |
1 | July 23, July 26 | Introduction, Basics |
Ch. 1 (S&B) |
Atari Simulator
|
2 | July 30, Aug 2 | Markov Decision Processes (MDPs) |
Ch. 3 (S&B) |
|
3 | Aug 6, Aug 9 | Dynamic Programming |
Ch. 4 (S&B) |
|
4 | Aug 13, Aug 16 | Monte Carlo Methods (Prediction) |
Ch. 5 (S&B) |
|
5 | Aug 20, - | TD Methods (Prediction) |
Ch. 6 (S&B) |
|
5 | -, Aug 30 | Monte Carlo, TD Method (Control) |
Ch. 5, 6 (S&B) |
|
6 | Sep 3, Sep 7 (Sat) | Monte Carlo, TD Method
(Control - cont.) |
Ch. 5, 6 (S&B) |
|
7 | -, Sep 13 | N-step TD, Eligibility Traces
|
Ch. 7, 12 (S&B) |
|
7 | Sep 17,Sep 20 | Mid-term Exam, Model based RL
|
Ch. 8 (S&B) |
|
8 | Sep 24 | Model Based RL
|
Ch. 8 (S&B) |
|
8 | Sep 28 (Saturday) | (Action-)Value Function Approximation | Ch. 9 (S&B) | |
9 | Oct 1, - | (Action-)Value Function Approximation
|
Ch. 9, 10 (S&B) |
|
9 | -, Oct 11 | (Action-)Value Function Approximation
|
Ch. 9, 10 (S&B) |
|
10 | Oct 15, Oct 18 | Value Function Approximation, Policy Gradient
|
Ch. 10, 13 (S&B) |
|
11 | Oct 22, Oct 25 | Policy Gradient, Misc. Topics
|
Ch. 13 (S&B), Ch. 10.3, Ch. 2 (S&B) |
|
12,13 | Oct 29, Oct 31,Nov 1 | Student Presentations
| | |
13,14 | Nov 5, Nov 6, Nov 8 | Student Presentations
| |
|
NOTE: (a) No Class on: Sep 6 (Fri). (b)Mid-term Exam: Sep 17.
(c) Make-up Class: Sep 28 (during original minor exam time).
july23.pdf
july26.pdf
july30.pdf
aug2.pdf
aug6.pdf
aug9.pdf
aug13.pdf
aug16.pdf
aug20.pdf
aug30.pdf
sep3.pdf
sep7.pdf
sep13.pdf
sep20.pdf
sep24.pdf,
sep 28 (class in SIT. No digital copy of notes),
oct1.pdf
oct11.pdf
oct15.pdf
oct18.pdf
oct22.pdf
oct25.pdf
References
- Reinforcement Learning (Second Edition).
Richard Sutton and Andrew Barto. MIT Press. 2018.
Online Version
- Algorithms for Reinforcement Learning.
Csaba Szepesvari. Morgan and Claypool. 2010.
Online Version
Background Reading
Additional Resources
Assignment Submission Instructions
- You are free to discuss the problems with other students in the class. But the final solution/code that you
produce should come through your individual efforts.
- Required code should be submitted using Moodle Page.
- Honor Code: Any cases of copying will be awarded a zero on the assignment. Additional
penalities will be imposed based on the severity of copying. Any copying cases run the chances of being
escalated to the Department/DISCO.
- Late policy: You may be allowed a set of buffer days
for each assignment, call it 'X' buffer days. There is no penalty
if your submission stays withing the limit of the X buffer days.
Any delay beyond X days will result in a substantial (20% for
each late day in submission) penalty.
Assignments
-
Assignment 1 . Deadline: Tuesday, October 1 (+ 2 buffer days), Midnight.
To be done individually.
Grading Policy (Tentative)
Assignments: 22.5%
- Ass1: 7.5%
- Ass2: 14% (was 15% earlier)
| Paper Reviewing and Presentation: 16% (was 20% earlier)
- Paper Reviewing: 6% (was 10% earlier)
- Paper Presentation: 10%
| Mid-term:22.5% |
Major: 40% (was 35% earlier) |
|