COL772: Natural Language Processing -- Autumn 2021

Start	End	Slides	Required Readings	Recommended Readings
Aug 10	Aug 17	Introduction	J&M Ch 1	Recent Advances in NLP
Aug 17	Aug 21	Regular Languages and Finite State Automata	SLP3 Ch 2
Aug 21	Aug 24	Finite State Transducers	J&M Ch 3
Aug 24	Aug 31	Classical Text Categorization: Naive Bayes, Logistic Regression	Notes (Sections 1-4) SLP3 Ch 4	Gender in Job Postings Useful Things about ML Performance Measures
Aug 28	Sep 11	Assignment 1	Resources
Aug 31	Sep 3	Sentiment Mining and Lexicon Learning	Survey (Sections 1-4.5) Tutorial (Sections 1-5) SLP3 Ch 19	Semantic Orientation of Adjectives Unsupervised Classification of Reviews
Sep 7	Sep 7	Vector Spaces in Information Retrieval	SLP3 Ch 6.1-6.6	LSA and PLSA Detailed Tutorial on LDA
Sep 10	Sep 10	An Intro to Deep Learning for NLP	Goldberg 2,4,5
Sep 14	Sep 30	Assignment 2	Resources
Sep 14	Sep 17	Representation Discovery: Word2Vec & GloVe	Goldberg 8.1-8.4, 10, 11 SLP3 6.7-6.11 Embeddings vs. Factorization	Contextual Embeddings Trends and Future Directions on Word Embeddings
Sep 17	Sep 17	N-gram Features with CNNs	Goldberg 13	Practitioner's Guide to CNNs
Sep 24	Sep 28	RNNs for Variable Length Sequences	Goldberg 14.1-14.3.1,14.4-14.5 Goldberg 15, 16.1.1, 16.2.2 Understanding LSTMs Deriving LSTMs Pooling in RNNs	RNNs and Vanishing Gradients (Section 4.3)
Sep 28	Sep 28	Tricks for Training RNNs	Deep Learning for NLP Best Practices
Sep 30	Oct 11	Assignment 3
Oct 1	Oct 1	Attention & Transformer	Goldberg 17.1, 17.2, 17.4 Attention is All You Need The Illustrated Transformer	Reformer
Oct 5	Oct 5	N-Gram Language Models	SLP3 Ch 3 Goldberg 9.1-9.3
Oct 12	Oct 12	Neural & Pre-Trained Language Models	Goldberg 9.4-9.5 SLP3 Ch 10 BERT Paper	ELMo Paper GPT2 Paper
Oct 12	Oct 24	Assignment 3 (Part B)
Oct 22	Oct 22	Advanced Pre-training for Language Models	BART T5 Pre-training tasks in ERNIE 2.0 (Section 4)	XLNet ALBERT
Oct 23	Oct 23	GPT3 & Beyond: Few-Shot Learning, Prompt Learning	GPT3 Adapter Tuning	GPT3 Explained Prefix Tuning
Oct 23	Oct 26	Multilingual NLP	Sentence Piece Tokenization mBART XLM-R BLEU	Typology of Languages
Oct 26	Nov 12	Assignment 4
Oct 26	Oct 29	Neural CRF and Learning with Constraints for Sequence Labeling	Goldberg 19.1-19.3, 19.4.2 Bidirectional LSTM-CRF Models Learning with Constraints
Oct 29	Nov 10	Statistical Natural Language Parsing	SLP3 Ch 12,1-12,5, 13.1-13.2, 14.1-14.3, Lectures Notes on PCFGs Lectures Notes on Lexicalized PCFGs
Nov 11	Nov 11	Fairness & Ethics in NLP
Nov 11	Nov 11	Recursive Neural Networks	Goldberg 18
Nov 11	Nov 11	Wrap Up

Start

End

Slides

Required Readings

Recommended Readings

Aug 10

Aug 17

Introduction

J&M Ch 1

Recent Advances in NLP

Aug 17

Aug 21

Regular Languages and Finite State Automata

SLP3 Ch 2

Aug 21

Aug 24

Finite State Transducers

J&M Ch 3

Aug 24

Aug 31

Classical Text Categorization: Naive Bayes, Logistic Regression

Notes (Sections 1-4)
SLP3 Ch 4

Gender in Job Postings
Useful Things about ML
Performance Measures

Aug 28

Sep 11

Assignment 1

Resources

Aug 31

Sep 3

Sentiment Mining and Lexicon Learning

Survey (Sections 1-4.5)
Tutorial (Sections 1-5)
SLP3 Ch 19

Semantic Orientation of Adjectives
Unsupervised Classification of Reviews

Sep 7

Vector Spaces in Information Retrieval

SLP3 Ch 6.1-6.6

LSA and PLSA
Detailed Tutorial on LDA

Sep 10

An Intro to Deep Learning for NLP

Goldberg 2,4,5

Sep 14

Sep 30

Assignment 2

Resources

Sep 14

Sep 17

Representation Discovery: Word2Vec & GloVe

Goldberg 8.1-8.4, 10, 11
SLP3 6.7-6.11
Embeddings vs. Factorization

Contextual Embeddings
Trends and Future Directions on Word Embeddings

Sep 17

N-gram Features with CNNs

Goldberg 13

Practitioner's Guide to CNNs

Sep 24

Sep 28

RNNs for Variable Length Sequences

Goldberg 14.1-14.3.1,14.4-14.5
Goldberg 15, 16.1.1, 16.2.2
Understanding LSTMs
Deriving LSTMs
Pooling in RNNs

RNNs and Vanishing Gradients (Section 4.3)

Sep 28

Tricks for Training RNNs

Deep Learning for NLP Best Practices

Sep 30

Oct 11

Assignment 3

Oct 1

Attention & Transformer

Goldberg 17.1, 17.2, 17.4
Attention is All You Need
The Illustrated Transformer

Reformer

Oct 5

N-Gram Language Models

SLP3 Ch 3
Goldberg 9.1-9.3

Oct 12

Neural & Pre-Trained Language Models

Goldberg 9.4-9.5
SLP3 Ch 10
BERT Paper

ELMo Paper
GPT2 Paper

Oct 12

Oct 24

Assignment 3 (Part B)

Oct 22

Advanced Pre-training for Language Models

BART
T5
Pre-training tasks in ERNIE 2.0 (Section 4)

XLNet
ALBERT

Oct 23

GPT3 & Beyond: Few-Shot Learning, Prompt Learning

GPT3
Adapter Tuning

GPT3 Explained
Prefix Tuning

Oct 23

Oct 26

Multilingual NLP

Sentence Piece Tokenization
mBART
XLM-R
BLEU

Typology of Languages

Oct 26

Nov 12

Assignment 4

Oct 26

Oct 29

Neural CRF and Learning with Constraints for Sequence Labeling

Goldberg 19.1-19.3, 19.4.2
Bidirectional LSTM-CRF Models
Learning with Constraints

Oct 29

Nov 10

Statistical Natural Language Parsing

SLP3 Ch 12,1-12,5, 13.1-13.2, 14.1-14.3,
Lectures Notes on PCFGs
Lectures Notes on Lexicalized PCFGs

Nov 11

Fairness & Ethics in NLP

Nov 11

Recursive Neural Networks

Goldberg 18

Nov 11

Wrap Up

Textbook and Readings

Yoav Goldberg Neural Network Methods for Natural Language Processing,
Morgan and Claypool (2017) (required).

Dan Jurafsky and James Martin Speech and Language Processing, 3nd Edition,
(under development).

Grading

Assignments: 50%; Midterm: 20%; Final: 30%; Class participation, online discussions: extra credit.

Course Administration and Policies

Subscribe to the class discussion group on Piazza. (access code: col772)
All programming assignments are to be done individually. You may discuss the subject matter with other students in the class, but all solutions, code, writeups must be your own. In your writeup mention names of any students with whom you discussed the projects. You are expected to maintain the utmost level of academic integrity in the course.
Programming assignments may be handed in up to a week late, at a penalty of 10% of the maximum grade per day.

Cheating Vs. Collaborating Guidelines

As adapted from Dan Weld's guidelines.

Collaboration is a very good thing. On the other hand, cheating is considered a very serious offense. Please don't do it! Concern about cheating creates an unpleasant environment for everyone. If you cheat, you get a zero in the assignment, and additionally you risk losing your position as a student in the department and the institute. The department's policy on cheating is to report any cases to the disciplinary committee. What follows afterwards is not fun.

So how do you draw the line between collaboration and cheating? Here's a reasonable set of ground rules. Failure to understand and follow these rules will constitute cheating, and will be dealt with as per institute guidelines.

The Kyunki Saas Bhi Kabhi Bahu Thi Rule: This rule says that you are free to meet with fellow students(s) and discuss assignments with them. Writing on a board or shared piece of paper is acceptable during the meeting; however, you should not take any written (electronic or otherwise) record away from the meeting. This applies when the assignment is supposed to be an individual effort or whenever two teams discuss common problems they are each encountering (inter-group collaboration). After the meeting, engage in a half hour of mind-numbing activity (like watching an episode of Kyunki Saas Bhi Kabhi Bahu Thi), before starting to work on the assignment. This will assure that you are able to reconstruct what you learned from the meeting, by yourself, using your own brain.

The Right to Information Rule: To assure that all collaboration is on the level, you must always write the name(s) of your collaborators on your assignment. This also applies when two groups collaborate.

Course Contents

Schedule

Textbook and Readings

Grading

Course Administration and Policies

Cheating Vs. Collaborating Guidelines