COL 764 - Information Retrieval and Web Search (2020-21 Sem 1)

Table of Contents



Organization

Credit Structure (L-T-P): 3-0-2 (4 credits)

Course slot: AA (Mon., Thu. 2:00-3:30)

Lecture location: Online Lectures will be held on Teams, on Mondays

Teaching Assistants: Vinayak Gupta, Mohit Gupta

Grading scheme

Activity Weight
Mid-term 15%
Major 20%
Assignments (2 x 10) 20%
Project 30%
Term-paper 15%

FAQ

  1. I really want to take this course, but there are no vacancies, what should I do?

    • The course registration limit will not be increased. You may have to wait for someone to drop out –invariably someone will in the next few weeks, and try to register.
    • Only if you are a Ph.D. student from CSE / SIT department, then send me a mail, including some details about why you need / want to take this course. We may make an exception and add you into the course.
  2. What background is required to do well in this course?

    • Strong programming proficiency is essential (Java, C++ or Python) along with knowledge of data-structures.
    • If you are weak in Probability and Statistics, then you should not take this course. MTL106 (or its equivalent) is recommended.
    • ML/NLP/AI will help, but not essential for succeeding in this course.
  3. I know I am registered to the course, but I can not see it in my Teams?

    • Check if you are using xxx@iitd.ac.in credentials for Teams login where xxx is your LDAP id. (Note the domain carefully).
    • If you have registered on eadmin, and your registration request is confirmed, then you will be automatically added to the course team. You may have to wait a couple of days for these to be synchronized. please do not send emails about it.
  4. Can I sit-through the course or audit the course?

    • Only Ph.D. students can sit-through the course. They can send me an email so that they can be manually added to the team.
    • Auditing the course is possible, although not encouraged (by me).

NEWS / UPDATES

  • [23 Jan 2021] Wrapped up the course.
  • … a whole bunch of videos are uploaded on impartus
  • [15 Oct 2020] Probabilistic Retrieval Models - II (BM25) is uploaded to Impartus (use 2x speed)
  • [10 Oct 2020] Evaluating IR systems video is uploaded to Impartus
  • [8 Oct 2020] Assignment 1 is up. Deadline for submissions is 22 Oct 2020
  • [3 Oct 2020] Document representation and Boolean Retrieval video uploaded to Impartus (use 2x speed)
  • [30 Sep 2020] Assignment 0 is up. Deadline for submissions is 4 Oct 2020 night
  • [28 Sep 2020] First lecture slides and recorded video are uploaded to Impartus (accessible via Moodle)


About the Course

Information retrieval -aka “search”- plays a central role in our modern digital lives. In this course we cover the fundamental concepts of information retrieval as well as some of the recent advances in the field such as the use of knowledge graphs for retrieval, neural methods for retrieval tasks, issues of fairness and fake news, and the use of succinct data-structures in building efficient search systems.

Objectives

Understand and be able to discuss concepts such as document representation methods, information needs, search result effectiveness metrics and web search engine architectures. Implement and use retrieval algorithms; test them on standard and large-scale data collections. Apply information retrieval and web search methods to solve real-world problems, appreciate their impact on modern everyday life.

Contents

(Tentative, may slightly deviate to focus on recent advances)

  • Retrieval models (Boolean, vector-space, probabilistic, language-model, Markov random fields, diversity-aware);
  • Design of test collections (TREC, crowd-sourcing) and retrieval effectiveness measures (micro-/macro-F measure, nDCG, BPref);
  • Collection models (multinomial repr.; topic mixtures) and topic modeling (LSA/LSI, LDA);
  • Search engine architecture (crawling, indexing, and web-page ranking);
  • Learning to rank including neural ranking;
  • Knowledge graphs;
  • Responsible IR (e.g., handling bias and fake-news, privacy, etc.);

NOTE: The course will involve a significant level of programming to process large datasets with focus on efficiency as well as quality of results.

Prerequisites:

  • datastructures and algorithms (COL106)
  • probability and statistics (MTL106 or equivalent)
  • comfortable with programming in Java/C++/Python, and with linear algebra.
  • background in machine learning and/or NLP is not mandatory (although it will help if you have)

Textbooks

  1. Introduction to Information Retrieval by Christopher Manning, Prabhakar Raghavan, and Hinrich Schütze, Cambridge University Press. I strongly encourage you to own a copy of this book. A high-quality preprint of the book is available from the book website
  2. Modern Information Retrieval : The Concepts and Technology behind Search by Ricardo Baeza-Yates and Ribeiro-Neto, 2010.
  3. Search Engines: Information Retrieval in Practice by Croft, Metzler and Strohman, 2010.
  4. Information Retrieval – Implementing and Evaluating Search Engines by Büttcher, Clarke and Cormack, MIT Press, 2010.

Calendar

Date Topic
28-Sep-2020 Organization and Introduction
05-Oct-2020 Inverted Indexes and Vector-space Models
12-Oct-2020 Binary Independence Model
19-Oct-2020 Relevance Feedback
Avatar
Srikanta Bedathur
DS Chair of Artificial Intelligence

Related