COL772/COL7372: Natural Language Processing - Spring 2026
Monday, Thursday 2-3:20 pm in LH408


Instructor: Mausam
(mausam at cse dot iitd dot ac dot in)
Office hours: by appointment, SIT Building Room 402
TAs (Office hours, by appointment):
Vishal Saley, csz208845 AT cse.iitd.ac.in
Bhavesh Gurnani, Bhavesh.Gurnani.cs521 AT cse.iitd.ac.in
Hasan Mustafa, aiy247541 AT scai.iitd.ac.in
Kausik Hira, aiz217586 AT scai.iitd.ac.in
T Karthikeyan, aiz238140 AT scai.iitd.ac.in
Sudipto Ghosh, aiz248311 AT scai.iitd.ac.in

Course Contents

NLP concepts: tokenization, syntax, information extraction, question answering, semantics, text classification, reasoning and textual inference, language modeling, discourse, machine translation, instruction following, knowledge-based NLP, NLP for low-resource languages, text2code.

Machine learning concepts: word2Vec, LSTMs, key-value attention, transformers, LLMs, pre-training, fine-tuning, LoRA, instruction fine-tuning, RLHF, LRMs, RLVR, efficient LLMs, RAG, agentic AI.

Schedule

Start End Slides Required Readings Recommended Readings
Jan 5Jan 8 Introduction History of NLP
Noam Chomsky on ML
Noam Chomsky on AI
Peter Norvig on Chomsky
Jan 8Jan 12 NLP Tasks NLP Tasks
Phases of NLP
Jan 12Jan 15 Text Categorization using Classical ML Notes (Sections 1-4)
SLP3 Ch 4
Gender in Job Postings
Useful Things about ML
Performance Measures
Jan 15Feb 2 Lexical Semantics with Word2Vec & GloVe SLP3 Ch5
Goldberg Ch8
Embeddings vs. Factorization
Issues with Word Embeddings
Equivalence of Embedding & Factorization
Jan 19Feb 9 Assignment 1 Resources

Feb 2Feb 5 Sentiment Mining with CNNs Survey (Sections 1-4.5)
Goldberg Ch13
Practitioner's Guide to CNNs
Feb 5Feb 9 Sequence Labeling with RNNs SLP3 Ch17.1-17.3
SLP3 Ch13.1, Ch13.3-13.6
Understanding LSTMs
Deriving LSTMs
Pooling in RNNs
RNNs and Vanishing Gradients (Section 4.3)
Feb 12Feb 16 Attention & Transformer Encoder SLP3 Ch8.1-8.4
Chakraborty Ch 6.1, 6.2, 6.4
The Illustrated Transformer
Contextual Embeddings
Attention is All You Need
The Annotated Transformer
The BackStory of Transformer
Feb 16Mar 9 N-Gram & Neural Language Models SLP3 Ch 3.1-3.5, 8.5, 13.2
Chakraborty Ch 4.1, 4.3, 5.2, 6.3
SLP3 Ch 3.6-3.7
Feb 17Mar 23 Assignment 2

Feb 19Mar 9 Dissecting Transformers: Tokenizers
Guest Lecture: Kausik Hira
Chakraborty Ch 2.4
SLP3 Ch 2.4
SentencePiece
Mar 9Mar 16 Dissecting Transformers: Variants of Attention Chakraborty Ch 6.5, Sparse Attention
Approximate Attention: LinFormer, Performer
Flash Attention, KV Caching, MQA & GQA
Multi-Head Latent Attention
Gated Attention
Mar 14Mar 14 Dissecting Transformers: Positional Embeddings
Guest Lecture: Sudipto Ghosh
Chakraborty Ch 6.4
Relative Position Embeddings
Mar 14Mar 19 Dissecting Transformers: Other Components Pre/Post Norm
Learning Rate Scheduling
Attention Residuals
Mar 19 Apr 23 Pre-Training: BERT to Qwen SLP3 7.1-7.3, 7.5
Chakraborty 7
BERT Paper BART
T5 GPT3
Mar 23 Mar 23 Parameter Efficient Fine-Tuning
Guest Lecture: Vishal Saley
LoRA Fine Tuning
LoRA Details
Gradient Accumulation & Checkpointing
Intrinsic Dimensionality
Mar 26Apr 10 Assignment 3

Mar 30 Apr 2 Natural Language Generation & Decoding Algorithms
Penalties
Token Selection Strategies
Speculative Sampling
FUDGE: Controlled Text Generation
The Curious Case of Neural Text Degeneration
Medusa
Prompt Engineering
Self-study
Prompting Guide

Apr 6 Apr 6 Distillation for LLMs
Guest Lecture: Vishal Saley
On Policy Distillation
MiniLLM
The Magic of LLM Distillation
Apr 9 Apr 12 Alignment using Instruction Tuning and RLHF
FOLLM (Section 4-4.3)
InstructGPT
RLHF Blog
Illustrating RLHF
PPO
PPO Explained
Apr 10Apr 24 Assignment 4

Apr 13 Apr 23 Reasoning using RLVR
GRPO
FunSearch
AlphaGeometry
Training GRPO at Scale
Apr 16 Apr 16 Mixture of Experts
Switch Transformer (Sections 1-4, 7)
DeepSeekMoE
Apr 20 Apr 20 Agents
DSPy
Apr 27 Apr 27 Question Answering & Retrieval Augmented Generation
Vector Databases
Retrieval-augmented Generation
FAISS Vector Index
RAG Details
Apr 27Apr 27 Wrap Up


Textbook and Readings

Dan Jurafsky and James Martin Speech and Language Processing, 3nd Edition,
(required).

Tanmoy Chakraborty Introduction to Large Language Models,
Wiley (2025) (required).

Yoav Goldberg Neural Network Methods for Natural Language Processing,
Morgan and Claypool (2017).

Sebastian Raschka Build a Large Language Model from Scratch,
Manning (2024).

Grading

Assignments: 30%; Quiz: 15%; Midterm Assignment: 20%; Final: 35%; Class participation: extra credit.

Course Administration and Policies

Cheating Vs. Collaborating Guidelines

As adapted from Dan Weld's guidelines.

Collaboration is a very good thing. On the other hand, cheating is considered a very serious offense. Please don't do it! Concern about cheating creates an unpleasant environment for everyone. If you cheat, you get a zero in the assignment, and additionally you risk losing your position as a student in the department and the institute. The department's policy on cheating is to report any cases to the disciplinary committee. What follows afterwards is not fun.

So how do you draw the line between collaboration and cheating? Here's a reasonable set of ground rules. Failure to understand and follow these rules will constitute cheating, and will be dealt with as per institute guidelines.