COL772: Natural Language Processing -- Spring 2023

Start	End	Slides	Required Readings	Recommended Readings
Jan 3	Jan 10	Introduction	J&M Ch 1	Recent Advances in NLP
Jan 13	Jan 17	Regular Languages and Finite State Automata	SLP3 Ch 2	Regular Expressions Demo
Jan 17	Jan 20	Finite State Transducers	J&M Ch 3
Jan 20	Jan 27	Classical Text Categorization: Naive Bayes, Logistic Regression	Notes (Sections 1-4) SLP3 Ch 4	Gender in Job Postings Useful Things about ML Performance Measures
Jan 27	Feb 3	Sentiment Mining and Lexicon Learning	Survey (Sections 1-4.5) Tutorial (Sections 1-5) SLP3 Ch 19	Semantic Orientation of Adjectives Unsupervised Classification of Reviews
Jan 31	Feb 13	Assignment 1	Resources
Feb 3	Feb 3	Vector Spaces in Information Retrieval	SLP3 Ch 6.1-6.6	LSA and PLSA Detailed Tutorial on LDA
Feb 3	Feb 7	An Intro to Deep Learning for NLP	Goldberg 2,4,5
Feb 7	Feb 17	Representation Discovery: Word2Vec & GloVe	Goldberg 8.1-8.4, 10, 11 SLP3 6.7-6.11 Embeddings vs. Factorization	Contextual Embeddings Trends and Future Directions on Word Embeddings
Feb 17	Feb 21	N-gram Features with CNNs	Goldberg 13	Practitioner's Guide to CNNs
Feb 21	Feb 28	RNNs for Variable Length Sequences	Goldberg 14.1-14.3.1,14.4-14.5 Goldberg 15, 16.1.1, 16.2.2 Understanding LSTMs Deriving LSTMs Pooling in RNNs	RNNs and Vanishing Gradients (Section 4.3)
Feb 28	Mar 3	Tricks for Training RNNs	Deep Learning for NLP Best Practices
Mar 3	Mar 17	Attention & Transformer	Goldberg 17.1, 17.2, 17.4 Attention is All You Need The Illustrated Transformer	Reformer
Mar 3	Mar 28	Assignment 2	Resources
Mar 14	Mar 14	Introduction to PyTorch and HPC Cluster
Mar 17	Mar 21	Neural CRF for Sequence Labeling	Goldberg 19.1-19.3, 19.4.2 Bidirectional LSTM-CRF Models
Mar 21	Mar 24	N-Gram Language Models	SLP3 Ch 3 Goldberg 9.1-9.3
Mar 24	Apr 1	Neural & Pre-Trained Language Models	Goldberg 9.4-9.5 SLP3 Ch 10 BERT Paper	ELMo Paper GPT2 Paper
Apr 1	Apr 11	Advanced Pre-Training Techniques	BART T5 Pre-training tasks in ERNIE 2.0 (Section 4)	XLNet ALBERT
Apr 9	Apr 29	Assignment 3
Apr 11	Apr 19	GPT3 & Beyond: Few-Shot Learning, Prompt Learning	GPT3 Adapter Tuning Language Models of 2022	GPT3 Explained Prefix Tuning
Apr 19	Apr 19	Multilingual NLP	Sentence Piece Tokenization mBART XLM-R BLEU	Typology of Languages
Apr 21	Apr 25	Statistical Natural Language Parsing	SLP3 Ch 12,1-12,5, 13.1-13.2, 14.1-14.3, Lectures Notes on PCFGs Lectures Notes on Lexicalized PCFGs
Apr 28	Apr 28	Data Augmentation for Low Data NLP
Apr 28	Apr 28	Constrained Training for Low Data NLP	Constraints in Deep Learning
Apr 29	Apr 29	Prompt Engineering in ChatGPT	Prompting Guide
Apr 29	Apr 29	Wrap Up

Start

End

Slides

Required Readings

Recommended Readings

Jan 3

Jan 10

Introduction

J&M Ch 1

Recent Advances in NLP

Jan 13

Jan 17

Regular Languages and Finite State Automata

SLP3 Ch 2

Regular Expressions Demo

Jan 17

Jan 20

Finite State Transducers

J&M Ch 3

Jan 20

Jan 27

Classical Text Categorization: Naive Bayes, Logistic Regression

Notes (Sections 1-4)
SLP3 Ch 4

Gender in Job Postings
Useful Things about ML
Performance Measures

Jan 27

Feb 3

Sentiment Mining and Lexicon Learning

Survey (Sections 1-4.5)
Tutorial (Sections 1-5)
SLP3 Ch 19

Semantic Orientation of Adjectives
Unsupervised Classification of Reviews

Jan 31

Feb 13

Assignment 1

Resources

Feb 3

Vector Spaces in Information Retrieval

SLP3 Ch 6.1-6.6

LSA and PLSA
Detailed Tutorial on LDA

Feb 3

Feb 7

An Intro to Deep Learning for NLP

Goldberg 2,4,5

Feb 7

Feb 17

Representation Discovery: Word2Vec & GloVe

Goldberg 8.1-8.4, 10, 11
SLP3 6.7-6.11
Embeddings vs. Factorization

Contextual Embeddings
Trends and Future Directions on Word Embeddings

Feb 17

Feb 21

N-gram Features with CNNs

Goldberg 13

Practitioner's Guide to CNNs

Feb 21

Feb 28

RNNs for Variable Length Sequences

Goldberg 14.1-14.3.1,14.4-14.5
Goldberg 15, 16.1.1, 16.2.2
Understanding LSTMs
Deriving LSTMs
Pooling in RNNs

RNNs and Vanishing Gradients (Section 4.3)

Feb 28

Mar 3

Tricks for Training RNNs

Deep Learning for NLP Best Practices

Mar 3

Mar 17

Attention & Transformer

Goldberg 17.1, 17.2, 17.4
Attention is All You Need
The Illustrated Transformer

Reformer

Mar 3

Mar 28

Assignment 2

Resources

Mar 14

Introduction to PyTorch and HPC Cluster

Mar 17

Mar 21

Neural CRF for Sequence Labeling

Goldberg 19.1-19.3, 19.4.2
Bidirectional LSTM-CRF Models

Mar 21

Mar 24

N-Gram Language Models

SLP3 Ch 3
Goldberg 9.1-9.3

Mar 24

Apr 1

Neural & Pre-Trained Language Models

Goldberg 9.4-9.5
SLP3 Ch 10
BERT Paper

ELMo Paper
GPT2 Paper

Apr 1

Apr 11

Advanced Pre-Training Techniques

BART
T5
Pre-training tasks in ERNIE 2.0 (Section 4)

XLNet
ALBERT

Apr 9

Apr 29

Assignment 3

Apr 11

Apr 19

GPT3 & Beyond: Few-Shot Learning, Prompt Learning

GPT3
Adapter Tuning
Language Models of 2022

GPT3 Explained
Prefix Tuning

Apr 19

Multilingual NLP

Sentence Piece Tokenization
mBART
XLM-R
BLEU

Typology of Languages

Apr 21

Apr 25

Statistical Natural Language Parsing

SLP3 Ch 12,1-12,5, 13.1-13.2, 14.1-14.3,
Lectures Notes on PCFGs
Lectures Notes on Lexicalized PCFGs

Apr 28

Data Augmentation for Low Data NLP

Apr 28

Constrained Training for Low Data NLP

Constraints in Deep Learning

Apr 29

Prompt Engineering in ChatGPT

Prompting Guide

Apr 29

Wrap Up

Textbook and Readings

Yoav Goldberg Neural Network Methods for Natural Language Processing,
Morgan and Claypool (2017) (required).

Dan Jurafsky and James Martin Speech and Language Processing, 3nd Edition,
(under development).

Grading

Assignments: 50%; Midterm: 20%; Final: 30%; Class participation, online discussions: extra credit.

Course Administration and Policies

Subscribe to the class discussion group on Piazza. (access code: col772)
All programming assignments are to be done individually. You may discuss the subject matter with other students in the class, but all solutions, code, writeups must be your own. In your writeup mention names of any students with whom you discussed the projects. You are expected to maintain the utmost level of academic integrity in the course.
Programming assignments may be handed in up to a week late, at a penalty of 10% of the maximum grade per day.

Cheating Vs. Collaborating Guidelines

As adapted from Dan Weld's guidelines.

Collaboration is a very good thing. On the other hand, cheating is considered a very serious offense. Please don't do it! Concern about cheating creates an unpleasant environment for everyone. If you cheat, you get a zero in the assignment, and additionally you risk losing your position as a student in the department and the institute. The department's policy on cheating is to report any cases to the disciplinary committee. What follows afterwards is not fun.

So how do you draw the line between collaboration and cheating? Here's a reasonable set of ground rules. Failure to understand and follow these rules will constitute cheating, and will be dealt with as per institute guidelines.

The Kyunki Saas Bhi Kabhi Bahu Thi Rule: This rule says that you are free to meet with fellow students(s) and discuss assignments with them. Writing on a board or shared piece of paper is acceptable during the meeting; however, you should not take any written (electronic or otherwise) record away from the meeting. This applies when the assignment is supposed to be an individual effort or whenever two teams discuss common problems they are each encountering (inter-group collaboration). After the meeting, engage in a half hour of mind-numbing activity (like watching an episode of Kyunki Saas Bhi Kabhi Bahu Thi), before starting to work on the assignment. This will assure that you are able to reconstruct what you learned from the meeting, by yourself, using your own brain.

The Right to Information Rule: To assure that all collaboration is on the level, you must always write the name(s) of your collaborators on your assignment. This also applies when two groups collaborate.

Course Contents

Schedule

Textbook and Readings

Grading

Course Administration and Policies

Cheating Vs. Collaborating Guidelines