COL772: Natural Language Processing - Autumn 2020
Tuesday, Friday 2-3:20 pm in Microsoft Teams


Instructor: Mausam
(mausam at cse dot iitd dot ac dot in)
Office hours: by appointment, SIT Building Room 402
TAs (Office hours, by appointment):
Keshav Kolluru, keshavkolluru AT gmail.com
Vipul Kumar Rathore, vipulrathore AT iisc.ac.in

Course Contents

NLP concepts: Tokenization, lemmatization, part of speech tagging, noun phrase chunking, named entity recognition, coreference resolution, parsing, information extraction, sentiment analysis, question answering, text classification, document clustering, document summarization, discourse, machine translation.

Machine learning concepts: Naive Bayes, MaxEnt classifiers, Hidden Markov Models, Conditional Random Fields, Probabilistic Context Free Grammars, Word2vec models, RNN-based neural models, Sequence to sequence neural models, Pre-trained language models.

Schedule

Start End Slides Required Readings Recommended Readings
Sep 29Oct 9 Introduction J&M Ch 1 Recent Advances in NLP
Oct 9Nov 7 Regular Languages and Finite State Automata SLP3 Ch 2  
Nov 7Nov 7 Morphology with Finite State Transducers J&M Ch 3  
Oct 13Oct 13 Classical Text Categorization: Naive Bayes, Logistic Regression Notes (Sections 1-4)
SLP3 Ch 4
Gender in Job Postings
Improvements to Multinomial Naive Bayes
Performance Measures
Error Correcting Output Codes
Nov 3Nov 3 Sentiment Mining and Lexicon Generation Survey (Sections 1-4.5)
Tutorial (Sections 1-5)
SLP3 Ch 19
Semantic Orientation of Adjectives
Unsupervised Classification of Reviews
Nov 6Nov 13 Generative vs. Max Entropy Models Max Entropy Tutorial Intro to Max Entropy Models
Nov 13Nov 13 Vector Spaces in Information Retrieval SLP3 Ch 6.1-6.6
LSA and PLSA
Detailed Tutorial on LDA
Nov 13Nov 17 An Intro to Deep Learning for NLP

Oct 23Oct 23 Representation Discovery for Words Goldberg 8.1-8.4, 10, 11
SLP3 6.7-6.11
Embeddings vs. Factorization
Contextual Embeddings
Trends and Future Directions on Word Embeddings
Oct 27Oct 27 N-gram Features with CNNs Goldberg 13
Practitioner's Guide to CNNs
Nov 17Nov 24 RNNs for Variable Length Sequences Goldberg 14.1-14.3.1,14.4-14.5
Goldberg 15, 16.1.1, 16.2.2
Understanding LSTMs
Deriving LSTMs
Pooling in RNNs
RNNs and Vanishing Gradients (Section 4.3)
Nov 17Nov 28 Assignment 1.1 Resources

Nov 24Nov 27 Attention & Transformer Goldberg 17.1, 17.2, 17.4
Attention is All You Need
The Illustrated Transformer
Reformer
Longformer
Nov 27Nov 27 Tricks for Training RNNs Deep Learning for NLP Best Practices
Nov 30Dec 15 Assignment 1.2

Dec 15Dec 15 N-Gram Language Models SLP3 Ch 3
Goldberg 9.1-9.3
Dec 18Dec 18 Neural & Pre-Trained Language Models Goldberg 9.4-9.5
SLP3 Ch 10
BERT Paper
ELMo Paper
GPT2 Paper
Dec 22Dec 29 Statistical Natural Language Parsing SLP3 Ch 12,1-12,5, 13.1-13.2, 14.1-14.6,
Lectures Notes on PCFGs
Lectures Notes on Lexicalized PCFGs

Dec 29Dec 29 Neural Models over Tree Structures Goldberg 18

Jan 2Jan 16 Assignment 2  
Resources
Jan 2Jan 2 Machine Translation SLP3 Ch 11
MBART

Jan 2Jan 2 NLP in Low Data Setting T5, GPT3
Constraints in Deep Learning

Textbook and Readings

Yoav Goldberg Neural Network Methods for Natural Language Processing,
Morgan and Claypool (2017) (required).

Dan Jurafsky and James Martin Speech and Language Processing, 3nd Edition,
(under development).

Grading

Assignments: 30%; Project: 20%; Minors: 20%; Final: 30%; Class participation, online discussions: extra credit.

Course Administration and Policies

Cheating Vs. Collaborating Guidelines

As adapted from Dan Weld's guidelines.

Collaboration is a very good thing. On the other hand, cheating is considered a very serious offense. Please don't do it! Concern about cheating creates an unpleasant environment for everyone. If you cheat, you get a zero in the assignment, and additionally you risk losing your position as a student in the department and the institute. The department's policy on cheating is to report any cases to the disciplinary committee. What follows afterwards is not fun.

So how do you draw the line between collaboration and cheating? Here's a reasonable set of ground rules. Failure to understand and follow these rules will constitute cheating, and will be dealt with as per institute guidelines.