\ CSL772: Natural Language Processing -- Autumn 2014
CSL772: Natural Language Processing - Autumn 2014
Monday, Thursday 5-6:20 pm in Bharti 201


Instructor: Mausam
(mausam at cse dot iitd dot ac dot in)
Office hours: by appointment, SIT Building Room 402
TAs (Office hours, by appointment):
Yashoteja Prabhu, yashoteja.prabhu AT gmail.com
Ankit Anand, ankit.s.anand AT gmail.com
Neetu Jindal, er.neetu28 AT gmail.com

Course Contents

NLP concepts: Tokenization, lemmatization, part of speech tagging, noun phrase chunking, named entity recognition, coreference resolution, parsing, information extraction, sentiment analysis, question answering, text classification, document clustering, document summarization, discourse, machine translation.
Machine learning concepts: Naive Bayes, Hidden Markov Models, Expectation Maximization, Conditional Random Fields, MaxEnt Classifiers, Probabilistic Context Free Grammars.

Schedule

Start End Slides Required Readings Recommended Readings
Jul 24Jul 31 Introduction J&M Ch 1  
Jul 31Jul 31 Regular Languages and Finite State Automata J&M Ch 2  
Jul 31Aug 11 Morphology with Finite State Transducers J&M Ch 3  
Aug 4Aug 11 Text Categorization using Naive Bayes Notes (Sections 1-4)
SLP3 (Upto Section 6.3)
Performance Measures
Error Correcting Output Codes
Aug 8Aug 19 Assignment 1   Raw Data (Ver 2), Class List
Format Checker, F-Score Calculator
Aug 21Aug 25 Sentiment Mining Survey (Sections 1-4.5)
Tutorial (Sections 1-5)
Semantic Orientation of Adjectives
Unsupervised Classification of Reviews
Aug 25Sep 4 Log Linear Models Notes (Section 2)
SLP3 (Section 6.4)
Max Entropy models for WSD
Aug 25Sep 16 Assignment 2  

Sep 2Sep 23 Project (Part 1)    
Sep 4Sep 8 Language Models J&M Ch 4
SLP3
Empirical Comparison of Smoothing Techniques
Sep 11Sep 11 POS Tagging with Hidden Markov Models J&M Ch 6.1-6.5
SLP3 (Ch 7, 8.1-8.4)

Sep 15Sep 18 Named Entity Recognition with MEMMs J&M Ch 6.7-6.8, 13.5
MEMMs (Section 8.5)
Non-Local Features and Knowledge in NER
Unsupervised Person Name Recognition
Sep 18Sep 18 Brown Clustering Thesis (Chapter 4)
Brown Clustering for NER
Sep 22Sep 22 Conditional Random Fields for NER and POS Tagging
Notes (Section 4)
Detailed Notes
Label Bias (Section 2)
Twitter NER (Sections 1-3.1)
Sep 22Sep 25
Information Extraction J&M Section 22.2
Background Knowledge in IE
Distant Supervision in IE
MultiR
Sep 29Sep 29
Guest Lecture by L V Subramaniam  
 
Oct 13Oct 13
Open Information Extraction ReVerb
OLLIE
 
Oct 15Nov 04 Assignment 3  

Oct 16Oct 16
Document Similarity in Information Retrieval IR Textbook (Chapters 2,6.2-6.3)
LSA and PLSA
 
Nov 10Nov 13 Statistical Natural Language Parsing J&M Ch 12, 14
Lectures Notes on PCFGs
Lectures Notes on Lexicalized PCFGs
Latent Variable models for Parsing
Nov 17Nov 17 Other NLP Tasks and Discussion Noam Chomsky on ML
Noam Chomsky on AI
Peter Norvig on Chomsky

Coherent Multi-Document Summarization

Textbook and Readings

Dan Jurafsky and James Martin, Speech and Language Processing, 2nd Edition,
Prentice-Hall (2008) (required).

Dan Jurafsky and James Martin Speech and Language Processing, 3nd Edition,
(under development).

Grading

Assignments: 30%; Project: 20%; Minor (each): 10%; Final: 30%; Class participation, online discussions: extra credit.

Course Administration and Policies

Cheating Vs. Collaborating Guidelines

As adapted from Dan Weld's guidelines.

Collaboration is a very good thing. On the other hand, cheating is considered a very serious offense. Please don't do it! Concern about cheating creates an unpleasant environment for everyone. If you cheat, you get a zero in the assignment, and additionally you risk losing your position as a student in the department and the institute. The department's policy on cheating is to report any cases to the disciplinary committee. What follows afterwards is not fun.

So how do you draw the line between collaboration and cheating? Here's a reasonable set of ground rules. Failure to understand and follow these rules will constitute cheating, and will be dealt with as per institute guidelines.