COL 870: Special Topics in Maching Learning (Reinforcement Learning)

General Information

Instructor: Parag Singla (email: parags AT

Class Timings/Venue:
  • Slot AD. Time: Tue,Fri - 3:30 pm - 5:00 pm.
  • Venue: LHC 308.

Office Hours:

Piazza Sign-up: Click Here
(Code is as announced in class)

Teaching Assistants:
Vishal Sharma (vishal.sharma AT cse)
Lovish Madaan (lovish.madaan.cs515 AT cse)

Paper Presentation Schedule, Paper Assignment and Teams Details:
Access Here


  • [July 22, 2019]: First Class!

Course Objective and Content

This course will cover basics of Reinforcement Learning including MDPs, Dynamic Programming, Monte Carlo Methods, TD Learning, On Policy and Off Policy Control. We will first look at the solution methods for tabular representation followed by dealing with function aproximation. We will also cover Policy Gradient and Actor-Critic Methods. Later part of the course will cover some Advanced Techniques for Deep Reinforcement Learning.

Week-wise Schedule

Week Dates Topic Book Chapters Supplementary Material
1 July 23, July 26 Introduction, Basics Ch. 1 (S&B) Atari Simulator
2July 30, Aug 2 Markov Decision Processes (MDPs) Ch. 3 (S&B)
3 Aug 6, Aug 9 Dynamic Programming Ch. 4 (S&B)
4 Aug 13, Aug 16 Monte Carlo Methods (Prediction) Ch. 5 (S&B)
5 Aug 20, - TD Methods (Prediction) Ch. 6 (S&B)
5 -, Aug 30 Monte Carlo, TD Method (Control) Ch. 5, 6 (S&B)
6 Sep 3, Sep 7 (Sat) Monte Carlo, TD Method (Control - cont.) Ch. 5, 6 (S&B)
7 -, Sep 13 N-step TD, Eligibility Traces Ch. 7, 12 (S&B)
7 Sep 17,Sep 20 Mid-term Exam, Model based RL Ch. 8 (S&B)
8 Sep 24 Model Based RL Ch. 8 (S&B)
8 Sep 28 (Saturday) (Action-)Value Function ApproximationCh. 9 (S&B)
9 Oct 1, - (Action-)Value Function Approximation Ch. 9, 10 (S&B)
9 -, Oct 11 (Action-)Value Function Approximation Ch. 9, 10 (S&B)
10 Oct 15, Oct 18 Value Function Approximation, Policy Gradient Ch. 10, 13 (S&B)
11 Oct 22, Oct 25 Policy Gradient, Misc. Topics Ch. 13 (S&B), Ch. 10.3, Ch. 2 (S&B)
12,13 Oct 29, Oct 31,Nov 1 Student Presentations
13,14 Nov 5, Nov 6, Nov 8 Student Presentations
NOTE: (a) No Class on: Sep 6 (Fri). (b)Mid-term Exam: Sep 17. (c) Make-up Class: Sep 28 (during original minor exam time).

Class Notes (Date-Wise)

july23.pdf july26.pdf july30.pdf aug2.pdf aug6.pdf aug9.pdf aug13.pdf aug16.pdf aug20.pdf aug30.pdf sep3.pdf sep7.pdf sep13.pdf sep20.pdf sep24.pdf, sep 28 (class in SIT. No digital copy of notes), oct1.pdf oct11.pdf oct15.pdf oct18.pdf oct22.pdf oct25.pdf


  1. Reinforcement Learning (Second Edition). Richard Sutton and Andrew Barto. MIT Press. 2018. Online Version
  2. Algorithms for Reinforcement Learning. Csaba Szepesvari. Morgan and Claypool. 2010. Online Version

Background Reading

Additional Resources

Assignment Submission Instructions

  1. You are free to discuss the problems with other students in the class. But the final solution/code that you produce should come through your individual efforts.
  2. Required code should be submitted using Moodle Page.
  3. Honor Code: Any cases of copying will be awarded a zero on the assignment. Additional penalities will be imposed based on the severity of copying. Any copying cases run the chances of being escalated to the Department/DISCO.
  4. Late policy: You may be allowed a set of buffer days for each assignment, call it 'X' buffer days. There is no penalty if your submission stays withing the limit of the X buffer days. Any delay beyond X days will result in a substantial (20% for each late day in submission) penalty.


  • Assignment 1 . Deadline: Tuesday, October 1 (+ 2 buffer days), Midnight. To be done individually.

Grading Policy (Tentative)

Assignments: 22.5%
  • Ass1: 7.5%
  • Ass2: 14% (was 15% earlier)
Paper Reviewing and Presentation: 16% (was 20% earlier)
  • Paper Reviewing: 6% (was 10% earlier)
  • Paper Presentation: 10%
Major: 40% (was 35% earlier)