COL772: Natural Language Processing - Autumn 2021
Tuesday, Friday 2-3:20 pm in Microsoft Teams


Instructor: Mausam
(mausam at cse dot iitd dot ac dot in)
Office hours: by appointment, SIT Building Room 402
TAs (Office hours, by appointment):
Keshav Kolluru, keshavkolluru AT gmail.com
Vishal Saley, Vishal.Vivek.Saley AT cse.iitd.ac.in
Kartikeya Badola, Kartikeya.Badola.ee118 AT ee.iitd.ac.in

Course Contents

NLP concepts: Tokenization, lemmatization, part of speech tagging, noun phrase chunking, named entity recognition, coreference resolution, parsing, information extraction, sentiment analysis, question answering, text classification, document clustering, document summarization, dialog systems, machine translation.

Machine learning concepts: Naive Bayes, Log-linear Models, Conditional Random Fields, Probabilistic Context Free Grammars, Word2vec models, RNN-based neural models, Sequence to sequence neural models, Pre-trained language models.

Schedule

Start End Slides Required Readings Recommended Readings
Aug 10 Aug 17 Introduction J&M Ch 1 Recent Advances in NLP
Aug 17Aug 21 Regular Languages and Finite State Automata SLP3 Ch 2  
Aug 21Aug 24 Finite State Transducers J&M Ch 3  
Aug 24Aug 31 Classical Text Categorization: Naive Bayes, Logistic Regression Notes (Sections 1-4)
SLP3 Ch 4
Gender in Job Postings
Useful Things about ML
Performance Measures
Aug 28Sep 11 Assignment 1 Resources

Aug 31Sep 3 Sentiment Mining and Lexicon Learning Survey (Sections 1-4.5)
Tutorial (Sections 1-5)
SLP3 Ch 19
Semantic Orientation of Adjectives
Unsupervised Classification of Reviews
Sep 7Sep 7 Vector Spaces in Information Retrieval SLP3 Ch 6.1-6.6
LSA and PLSA
Detailed Tutorial on LDA
Sep 10Sep 10 An Intro to Deep Learning for NLP Goldberg 2,4,5

Sep 14Sep 30 Assignment 2 Resources

Sep 14Sep 17 Representation Discovery: Word2Vec & GloVe Goldberg 8.1-8.4, 10, 11
SLP3 6.7-6.11
Embeddings vs. Factorization
Contextual Embeddings
Trends and Future Directions on Word Embeddings
Sep 17Sep 17 N-gram Features with CNNs Goldberg 13
Practitioner's Guide to CNNs
Sep 24Sep 28 RNNs for Variable Length Sequences Goldberg 14.1-14.3.1,14.4-14.5
Goldberg 15, 16.1.1, 16.2.2
Understanding LSTMs
Deriving LSTMs
Pooling in RNNs
RNNs and Vanishing Gradients (Section 4.3)
Sep 28Sep 28 Tricks for Training RNNs Deep Learning for NLP Best Practices
Sep 30Oct 11 Assignment 3

Oct 1Oct 1 Attention & Transformer Goldberg 17.1, 17.2, 17.4
Attention is All You Need
The Illustrated Transformer
Reformer
Oct 5Oct 5 N-Gram Language Models SLP3 Ch 3
Goldberg 9.1-9.3
Oct 12Oct 12 Neural & Pre-Trained Language Models Goldberg 9.4-9.5
SLP3 Ch 10
BERT Paper
ELMo Paper
GPT2 Paper
Oct 12Oct 24 Assignment 3 (Part B)

Oct 22Oct 22 Advanced Pre-training for Language Models BART
T5
Pre-training tasks in ERNIE 2.0 (Section 4)
XLNet
ALBERT
Oct 23Oct 23 GPT3 & Beyond: Few-Shot Learning, Prompt Learning GPT3
Adapter Tuning
GPT3 Explained
Prefix Tuning
Oct 23Oct 26 Multilingual NLP Sentence Piece Tokenization
mBART
XLM-R
BLEU
Typology of Languages
Oct 26Nov 12 Assignment 4

Oct 26Oct 29 Neural CRF and Learning with Constraints for Sequence Labeling Goldberg 19.1-19.3, 19.4.2
Bidirectional LSTM-CRF Models
Learning with Constraints

Oct 29Nov 10 Statistical Natural Language Parsing SLP3 Ch 12,1-12,5, 13.1-13.2, 14.1-14.3,
Lectures Notes on PCFGs
Lectures Notes on Lexicalized PCFGs

Nov 11Nov 11 Fairness & Ethics in NLP

Nov 11Nov 11 Recursive Neural Networks Goldberg 18

Nov 11Nov 11 Wrap Up


Textbook and Readings

Yoav Goldberg Neural Network Methods for Natural Language Processing,
Morgan and Claypool (2017) (required).

Dan Jurafsky and James Martin Speech and Language Processing, 3nd Edition,
(under development).

Grading

Assignments: 50%; Midterm: 20%; Final: 30%; Class participation, online discussions: extra credit.

Course Administration and Policies

Cheating Vs. Collaborating Guidelines

As adapted from Dan Weld's guidelines.

Collaboration is a very good thing. On the other hand, cheating is considered a very serious offense. Please don't do it! Concern about cheating creates an unpleasant environment for everyone. If you cheat, you get a zero in the assignment, and additionally you risk losing your position as a student in the department and the institute. The department's policy on cheating is to report any cases to the disciplinary committee. What follows afterwards is not fun.

So how do you draw the line between collaboration and cheating? Here's a reasonable set of ground rules. Failure to understand and follow these rules will constitute cheating, and will be dealt with as per institute guidelines.