COL772: Natural Language Processing -- Autumn 2020

Start	End	Slides	Required Readings	Recommended Readings
Sep 29	Oct 9	Introduction	J&M Ch 1	Recent Advances in NLP
Oct 9	Nov 7	Regular Languages and Finite State Automata	SLP3 Ch 2
Nov 7	Nov 7	Morphology with Finite State Transducers	J&M Ch 3
Oct 13	Oct 13	Classical Text Categorization: Naive Bayes, Logistic Regression	Notes (Sections 1-4) SLP3 Ch 4	Gender in Job Postings Improvements to Multinomial Naive Bayes Performance Measures Error Correcting Output Codes
Nov 3	Nov 3	Sentiment Mining and Lexicon Generation	Survey (Sections 1-4.5) Tutorial (Sections 1-5) SLP3 Ch 19	Semantic Orientation of Adjectives Unsupervised Classification of Reviews
Nov 6	Nov 13	Generative vs. Max Entropy Models	Max Entropy Tutorial	Intro to Max Entropy Models
Nov 13	Nov 13	Vector Spaces in Information Retrieval	SLP3 Ch 6.1-6.6	LSA and PLSA Detailed Tutorial on LDA
Nov 13	Nov 17	An Intro to Deep Learning for NLP
Oct 23	Oct 23	Representation Discovery for Words	Goldberg 8.1-8.4, 10, 11 SLP3 6.7-6.11	Embeddings vs. Factorization Contextual Embeddings Trends and Future Directions on Word Embeddings
Oct 27	Oct 27	N-gram Features with CNNs	Goldberg 13	Practitioner's Guide to CNNs
Nov 17	Nov 24	RNNs for Variable Length Sequences	Goldberg 14.1-14.3.1,14.4-14.5 Goldberg 15, 16.1.1, 16.2.2 Understanding LSTMs Deriving LSTMs Pooling in RNNs	RNNs and Vanishing Gradients (Section 4.3)
Nov 17	Nov 28	Assignment 1.1	Resources
Nov 24	Nov 27	Attention & Transformer	Goldberg 17.1, 17.2, 17.4 Attention is All You Need The Illustrated Transformer	Reformer Longformer
Nov 27	Nov 27	Tricks for Training RNNs	Deep Learning for NLP Best Practices
Nov 30	Dec 15	Assignment 1.2
Dec 15	Dec 15	N-Gram Language Models	SLP3 Ch 3 Goldberg 9.1-9.3
Dec 18	Dec 18	Neural & Pre-Trained Language Models	Goldberg 9.4-9.5 SLP3 Ch 10 BERT Paper	ELMo Paper GPT2 Paper
Dec 22	Dec 29	Statistical Natural Language Parsing	SLP3 Ch 12,1-12,5, 13.1-13.2, 14.1-14.6, Lectures Notes on PCFGs Lectures Notes on Lexicalized PCFGs
Dec 29	Dec 29	Neural Models over Tree Structures	Goldberg 18
Jan 2	Jan 16	Assignment 2		Resources
Jan 2	Jan 2	Machine Translation	SLP3 Ch 11 MBART
Jan 2	Jan 2	NLP in Low Data Setting		T5, GPT3 Constraints in Deep Learning

Start

End

Slides

Required Readings

Recommended Readings

Sep 29

Oct 9

Introduction

J&M Ch 1

Recent Advances in NLP

Oct 9

Nov 7

Regular Languages and Finite State Automata

SLP3 Ch 2

Nov 7

Morphology with Finite State Transducers

J&M Ch 3

Oct 13

Classical Text Categorization: Naive Bayes, Logistic Regression

Notes (Sections 1-4)
SLP3 Ch 4

Gender in Job Postings
Improvements to Multinomial Naive Bayes
Performance Measures
Error Correcting Output Codes

Nov 3

Sentiment Mining and Lexicon Generation

Survey (Sections 1-4.5)
Tutorial (Sections 1-5)
SLP3 Ch 19

Semantic Orientation of Adjectives
Unsupervised Classification of Reviews

Nov 6

Nov 13

Generative vs. Max Entropy Models

Max Entropy Tutorial

Intro to Max Entropy Models

Nov 13

Vector Spaces in Information Retrieval

SLP3 Ch 6.1-6.6

LSA and PLSA
Detailed Tutorial on LDA

Nov 13

Nov 17

An Intro to Deep Learning for NLP

Oct 23

Representation Discovery for Words

Goldberg 8.1-8.4, 10, 11
SLP3 6.7-6.11

Embeddings vs. Factorization
Contextual Embeddings
Trends and Future Directions on Word Embeddings

Oct 27

N-gram Features with CNNs

Goldberg 13

Practitioner's Guide to CNNs

Nov 17

Nov 24

RNNs for Variable Length Sequences

Goldberg 14.1-14.3.1,14.4-14.5
Goldberg 15, 16.1.1, 16.2.2
Understanding LSTMs
Deriving LSTMs
Pooling in RNNs

RNNs and Vanishing Gradients (Section 4.3)

Nov 17

Nov 28

Assignment 1.1

Resources

Nov 24

Nov 27

Attention & Transformer

Goldberg 17.1, 17.2, 17.4
Attention is All You Need
The Illustrated Transformer

Reformer
Longformer

Nov 27

Tricks for Training RNNs

Deep Learning for NLP Best Practices

Nov 30

Dec 15

Assignment 1.2

Dec 15

N-Gram Language Models

SLP3 Ch 3
Goldberg 9.1-9.3

Dec 18

Neural & Pre-Trained Language Models

Goldberg 9.4-9.5
SLP3 Ch 10
BERT Paper

ELMo Paper
GPT2 Paper

Dec 22

Dec 29

Statistical Natural Language Parsing

SLP3 Ch 12,1-12,5, 13.1-13.2, 14.1-14.6,
Lectures Notes on PCFGs
Lectures Notes on Lexicalized PCFGs

Dec 29

Neural Models over Tree Structures

Goldberg 18

Jan 2

Jan 16

Assignment 2

Resources

Jan 2

Machine Translation

SLP3 Ch 11
MBART

Jan 2

NLP in Low Data Setting

T5, GPT3
Constraints in Deep Learning

Textbook and Readings

Yoav Goldberg Neural Network Methods for Natural Language Processing,
Morgan and Claypool (2017) (required).

Dan Jurafsky and James Martin Speech and Language Processing, 3nd Edition,
(under development).

Grading

Assignments: 30%; Project: 20%; Minors: 20%; Final: 30%; Class participation, online discussions: extra credit.

Course Administration and Policies

Subscribe to the class discussion group on Piazza. (access code: col772)
All programming assignments are to be done individually. You may discuss the subject matter with other students in the class, but all solutions, code, writeups must be your own. In your writeup mention names of any students with whom you discussed the projects. You are expected to maintain the utmost level of academic integrity in the course.
Programming assignments may be handed in up to a week late, at a penalty of 10% of the maximum grade per day.
The project is to be done in a group of two. You may take special written permission in case you wish to do a project in group of any other size (even one). Except for unusual circumstances, all team members will get the same grade.
There is no late policy for the project submission. Project needs to be submitted by the deadline.

Cheating Vs. Collaborating Guidelines

As adapted from Dan Weld's guidelines.

Collaboration is a very good thing. On the other hand, cheating is considered a very serious offense. Please don't do it! Concern about cheating creates an unpleasant environment for everyone. If you cheat, you get a zero in the assignment, and additionally you risk losing your position as a student in the department and the institute. The department's policy on cheating is to report any cases to the disciplinary committee. What follows afterwards is not fun.

So how do you draw the line between collaboration and cheating? Here's a reasonable set of ground rules. Failure to understand and follow these rules will constitute cheating, and will be dealt with as per institute guidelines.

The Kyunki Saas Bhi Kabhi Bahu Thi Rule: This rule says that you are free to meet with fellow students(s) and discuss assignments with them. Writing on a board or shared piece of paper is acceptable during the meeting; however, you should not take any written (electronic or otherwise) record away from the meeting. This applies when the assignment is supposed to be an individual effort or whenever two teams discuss common problems they are each encountering (inter-group collaboration). After the meeting, engage in a half hour of mind-numbing activity (like watching an episode of Kyunki Saas Bhi Kabhi Bahu Thi), before starting to work on the assignment. This will assure that you are able to reconstruct what you learned from the meeting, by yourself, using your own brain.

The Right to Information Rule: To assure that all collaboration is on the level, you must always write the name(s) of your collaborators on your assignment. This also applies when two groups collaborate.

Course Contents

Schedule

Textbook and Readings

Grading

Course Administration and Policies

Cheating Vs. Collaborating Guidelines