COL772: Natural Language Processing - Spring 2023
Tuesday, Friday 2-3:20 pm in LH310


Instructor: Mausam
(mausam at cse dot iitd dot ac dot in)
Office hours: by appointment, SIT Building Room 402
TAs (Office hours, by appointment):
Vishal Saley, csz208845 AT cse.iitd.ac.in
Prayushi Faldu, csy217548 AT cse.iitd.ac.in
Suchith Prabhu, aiz218323 AT scai.iitd.ac.in

Course Contents

NLP concepts: Tokenization, lemmatization, part of speech tagging, noun phrase chunking, named entity recognition, coreference resolution, parsing, information extraction, sentiment analysis, question answering, text classification, document clustering, document summarization, dialog systems, machine translation, multilinguality.

Machine learning concepts: Naive Bayes, Log-linear Models, Conditional Random Fields, Probabilistic Context Free Grammars, Word2vec models, RNN-based neural models, Sequence to sequence neural models, Pre-trained language models, multilingual language models.

Schedule

Start End Slides Required Readings Recommended Readings
Jan 3Jan 10 Introduction J&M Ch 1 Recent Advances in NLP
Jan 13Jan 17 Regular Languages and Finite State Automata SLP3 Ch 2 Regular Expressions Demo
Jan 17Jan 20 Finite State Transducers J&M Ch 3  
Jan 20Jan 27 Classical Text Categorization: Naive Bayes, Logistic Regression Notes (Sections 1-4)
SLP3 Ch 4
Gender in Job Postings
Useful Things about ML
Performance Measures
Jan 27Feb 3 Sentiment Mining and Lexicon Learning Survey (Sections 1-4.5)
Tutorial (Sections 1-5)
SLP3 Ch 19
Semantic Orientation of Adjectives
Unsupervised Classification of Reviews
Jan 31Feb 13 Assignment 1 Resources

Feb 3Feb 3 Vector Spaces in Information Retrieval SLP3 Ch 6.1-6.6
LSA and PLSA
Detailed Tutorial on LDA
Feb 3Feb 7 An Intro to Deep Learning for NLP Goldberg 2,4,5

Feb 7Feb 17 Representation Discovery: Word2Vec & GloVe Goldberg 8.1-8.4, 10, 11
SLP3 6.7-6.11
Embeddings vs. Factorization
Contextual Embeddings
Trends and Future Directions on Word Embeddings
Feb 17Feb 21 N-gram Features with CNNs Goldberg 13
Practitioner's Guide to CNNs
Feb 21Feb 28 RNNs for Variable Length Sequences Goldberg 14.1-14.3.1,14.4-14.5
Goldberg 15, 16.1.1, 16.2.2
Understanding LSTMs
Deriving LSTMs
Pooling in RNNs
RNNs and Vanishing Gradients (Section 4.3)
Feb 28Mar 3 Tricks for Training RNNs Deep Learning for NLP Best Practices
Mar 3Mar 17 Attention & Transformer Goldberg 17.1, 17.2, 17.4
Attention is All You Need
The Illustrated Transformer
Reformer
Mar 3Mar 28 Assignment 2 Resources

Mar 14Mar 14 Introduction to PyTorch and HPC Cluster

Mar 17Mar 21 Neural CRF for Sequence Labeling Goldberg 19.1-19.3, 19.4.2
Bidirectional LSTM-CRF Models

Mar 21Mar 24 N-Gram Language Models SLP3 Ch 3
Goldberg 9.1-9.3
Mar 24 Apr 1 Neural & Pre-Trained Language Models Goldberg 9.4-9.5
SLP3 Ch 10
BERT Paper
ELMo Paper
GPT2 Paper
Apr 1Apr 11 Advanced Pre-Training Techniques BART
T5
Pre-training tasks in ERNIE 2.0 (Section 4)
XLNet
ALBERT
Apr 9Apr 29 Assignment 3

Apr 11Apr 19 GPT3 & Beyond: Few-Shot Learning, Prompt Learning GPT3
Adapter Tuning
Language Models of 2022
GPT3 Explained
Prefix Tuning
Apr 19Apr 19 Multilingual NLP Sentence Piece Tokenization
mBART
XLM-R
BLEU
Typology of Languages
Apr 21Apr 25 Statistical Natural Language Parsing SLP3 Ch 12,1-12,5, 13.1-13.2, 14.1-14.3,
Lectures Notes on PCFGs
Lectures Notes on Lexicalized PCFGs

Apr 28Apr 28 Data Augmentation for Low Data NLP
Apr 28Apr 28 Constrained Training for Low Data NLP Constraints in Deep Learning
Apr 29Apr 29 Prompt Engineering in ChatGPT Prompting Guide

Apr 29Apr 29 Wrap Up


Textbook and Readings

Yoav Goldberg Neural Network Methods for Natural Language Processing,
Morgan and Claypool (2017) (required).

Dan Jurafsky and James Martin Speech and Language Processing, 3nd Edition,
(under development).

Grading

Assignments: 50%; Midterm: 20%; Final: 30%; Class participation, online discussions: extra credit.

Course Administration and Policies

Cheating Vs. Collaborating Guidelines

As adapted from Dan Weld's guidelines.

Collaboration is a very good thing. On the other hand, cheating is considered a very serious offense. Please don't do it! Concern about cheating creates an unpleasant environment for everyone. If you cheat, you get a zero in the assignment, and additionally you risk losing your position as a student in the department and the institute. The department's policy on cheating is to report any cases to the disciplinary committee. What follows afterwards is not fun.

So how do you draw the line between collaboration and cheating? Here's a reasonable set of ground rules. Failure to understand and follow these rules will constitute cheating, and will be dealt with as per institute guidelines.