COL772: Natural Language Processing - Spring 2016
Monday, Tuesday, Friday 12-12:50 pm in Bharti 201


Instructor: Mausam
(mausam at cse dot iitd dot ac dot in)
Office hours: by appointment, SIT Building Room 402
TAs (Office hours, by appointment):
Prachi Jain, p6.jain AT gmail.com
Happy Mittal, happy2332 AT gmail.com
Harinder Pal Singh, mcs142123 AT iitd.ac.in

Course Contents

NLP concepts: Tokenization, lemmatization, part of speech tagging, noun phrase chunking, named entity recognition, coreference resolution, parsing, information extraction, sentiment analysis, question answering, text classification, document clustering, document summarization, discourse, machine translation.

Machine learning concepts: Naive Bayes, Hidden Markov Models, Expectation Maximization, Conditional Random Fields, MaxEnt Classifiers, Probabilistic Context Free Grammars, Deep Learning.

Schedule

Start End Slides Required Readings Recommended Readings
Jan 8Jan 15 Introduction J&M Ch 1 Advances in NLP
Jan 19Jan 19 Regular Languages and Finite State Automata J&M Ch 2  
Jan 19Jan 22 Morphology with Finite State Transducers J&M Ch 3  
Jan 25Feb 1 Text Categorization using Naive Bayes Notes (Sections 1-4)
SLP3 (Upto Section 7.3)
Gender in Job Postings
Improvements to Multinomial Naive Bayes
Performance Measures
Error Correcting Output Codes
Feb 1Feb 2 Sentiment Mining Survey (Sections 1-4.5)
Tutorial (Sections 1-5)
Semantic Orientation of Adjectives
Unsupervised Classification of Reviews
Feb 5Feb 8 Log Linear Models for Classification Notes (Section 2)
SLP3 (7.4-7.6)
Max Entropy models for WSD
Feb 8Feb 9 Generative vs. Max Entropy Models Max Entropy Tutorial Intro to Max Entropy Models
Feb 9Feb 9 Other Discriminative Models for Classification

Feb 10Mar 8 Assignment 1   Format Checker,
Feb 23Feb 25 Language Models J&M Ch 4
SLP3
Empirical Comparison of Smoothing Techniques
Feb 25Feb 25 Domain Adaptation Paper

Mar 4Mar 18 Project (Part 1)    
Mar 7Mar 8 Neural Language Models and Representation Discovery Tutorial (Chapters 1-6)

Mar 9Mar 11 POS Tagging with Hidden Markov Models J&M Ch 6.1-6.5
SLP3 (Ch 8, 9.1-9.4)

Mar 11Mar 18 Named Entity Recognition with CRFs Notes (Section 4)
Detailed Notes
Non-Local Features and Knowledge in NER
Twitter NER (Sections 1-3.1)
Mar 23Apr 20 Assignment 2   Format Checker,
Mar 28Mar 28 Brown Clustering Thesis (Chapter 4)
Brown Clustering for NER
Mar 29Apr 1
Information Extraction J&M Section 22.2
Background Knowledge in IE
Distant Supervision in IE
MultiR
Apr 4Apr 4
Open Information Extraction ReVerb
OLLIE
 
Apr 5Apr 5
Coupled Semi-Supervised Learning in NELL NELL Architecture
NELL Latest Status
Coupled Semi-Supervised Learning

Apr 5Apr 12 Statistical Natural Language Parsing J&M Ch 12, 14
Lectures Notes on PCFGs
Lectures Notes on Lexicalized PCFGs
Latent Variable models for Parsing
Apr 12Apr 19 Constraint Conditional Models for Semantic Role Labeling CCMs for SRL

Apr 18Apr 18 Microsoft Guest Lecture on Dialog Systems

Apr 25Apr 25 Summarization
Coherent Multi-Document Summarization
Apr 25Apr 25 Wrap Up Noam Chomsky on ML
Noam Chomsky on AI
Peter Norvig on Chomsky



Textbook and Readings

Dan Jurafsky and James Martin, Speech and Language Processing, 2nd Edition,
Prentice-Hall (2008) (required).

Dan Jurafsky and James Martin Speech and Language Processing, 3nd Edition,
(under development).

Grading

Assignments: 30%; Project: 20%; Minors: 20%; Final: 30%; Class participation, online discussions: extra credit.

Course Administration and Policies

Cheating Vs. Collaborating Guidelines

As adapted from Dan Weld's guidelines.

Collaboration is a very good thing. On the other hand, cheating is considered a very serious offense. Please don't do it! Concern about cheating creates an unpleasant environment for everyone. If you cheat, you get a zero in the assignment, and additionally you risk losing your position as a student in the department and the institute. The department's policy on cheating is to report any cases to the disciplinary committee. What follows afterwards is not fun.

So how do you draw the line between collaboration and cheating? Here's a reasonable set of ground rules. Failure to understand and follow these rules will constitute cheating, and will be dealt with as per institute guidelines.