htw saar Piktogramm QR-encoded URL
Back to Main Page Choose Module Version:
XML-Code

flag flag

Information Retrieval

Module name (EN):
Name of module in study programme. It should be precise and clear.
Information Retrieval
Degree programme:
Study Programme with validity of corresponding study regulations containing this module.
Applied Informatics, Bachelor, ASPO 01.10.2011
Module code: PIBWI29
Hours per semester week / Teaching method:
The count of hours per week is a combination of lecture (V for German Vorlesung), exercise (U for Übung), practice (P) oder project (PA). For example a course of the form 2V+2U has 2 hours of lecture and 2 hours of exercise per week.
2V+2PA (4 hours per week)
ECTS credits:
European Credit Transfer System. Points for successful completion of a course. Each ECTS point represents a workload of 30 hours.
5
Semester: 5
Mandatory course: no
Language of instruction:
English
Assessment:
Written exam, duration 90 min./project work

[updated 13.10.2024]
Applicability / Curricular relevance:
All study programs (with year of the version of study regulations) containing the course.

DFIW-IRET (P610-0540) Computer Science and Web Engineering, Bachelor, ASPO 01.10.2019 , semester 3, mandatory course, informatics specific
KI584 (P610-0253) Computer Science and Communication Systems, Bachelor, ASPO 01.10.2014 , semester 5, optional course, informatics specific
KIB-IRET Computer Science and Communication Systems, Bachelor, ASPO 01.10.2021 , semester 5, optional course, technical
KIB-IRET Computer Science and Communication Systems, Bachelor, ASPO 01.10.2022 , semester 5, optional course, technical
PIBWI29 Applied Informatics, Bachelor, ASPO 01.10.2011 , semester 5, optional course, informatics specific
PIB-IRET (P221-0080) Applied Informatics, Bachelor, ASPO 01.10.2022 , semester 5, optional course, informatics specific

Suitable for exchange students (learning agreement)
Workload:
Workload of student for successfully completing the course. Each ECTS credit represents 30 working hours. These are the combined effort of face-to-face time, post-processing the subject of the lecture, exercises and preparation for the exam.

The total workload is distributed on the semester (01.04.-30.09. during the summer term, 01.10.-31.03. during the winter term).
60 class hours (= 45 clock hours) over a 15-week period.
The total student study time is 150 hours (equivalent to 5 ECTS credits).
There are therefore 105 hours available for class preparation and follow-up work and exam preparation.
Recommended prerequisites (modules):
None.
Recommended as prerequisite for:
Module coordinator:
Prof. Dr. Klaus Berberich
Lecturer: Prof. Dr. Klaus Berberich

[updated 18.03.2015]
Learning outcomes:
After successfully completing this course, students will have learned basic information retrieval methods. This
includes retrieval models (e.g., Vector Space Model and Binary Independence Model), link analysis
(e.g., PageRank), and effectiveness measures (e.g., Precision/Recall
and MAP). They will be able to apply/implement the above methods in practice. In
addition, students will be aware of easily accessible information
retrieval systems (e.g., Apache Lucene/Solr).


[updated 13.10.2024]
Module content:
Information Retrieval is pervasive and its applications range from
finding contacts or e-mails on your smartphone to web-search engines
that index billions of web pages. This course covers the most
important information retrieval methods. We will look into how
these methods are defined formally, including the mathematics behind
them, but also see how they can be implemented efficiently in
practice. As part of the project work, we will implement a small
search engine from scratch.
 
1. Introduction
- History
- Applications
- Course overview
 
2. Natural language
- Documents and terms
- Stopwords and stemming/lemmatization
- Synonyms, polysemes, compounds
 
3. Retrieval models
- Boolean retrieval
- Vector space model with TF.IDF term weighting
- Language models
 
4. Indexing methods
- Inverted index
- Compression (d-Gaps, variable-byte encoding)
- Index pruning
 
5. Query processing
- Holistic methods (DAAT, TAAT)
- Top-k methods (NRA, WAND)
 
6. Evaluation
- Cranfield Paradigm
- Benchmark initiatives (TREC, CLEF, NTCIR)
- Traditional effectiveness measures (precision, recall, MAP)
- Non-traditional effectiveness measures (nDCG, ERR)
 
7. Web retrieval
- Crawling
- Near-duplicate detection
- Link analysis (PageRank, HITS)
- Web spam
 
8. Information retrieval systems
- Indri
- Terrier
- Anserini
- Apache Lucene/Solr
- ElasticSearch
 


[updated 13.10.2024]
Recommended or required reading:
Stefan Büttcher, Charles L. A. Clarke, Gordon V. Cormack: Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010.
 
Reginald Ferber: Information Retrieval: Suchmodelle und Data-Mining Verfahren für Textsammlungen und das Web, dpunkt, 2003.
(available online at: http://information-retrieval.de/irb/ir.html)
 
W. Bruce Croft, T. Strohman, D. Metzler: Search Engines Information Retrieval in Practice: Information Retrieval in Practice, Pearson, 2009
(Available online at: https://ciir.cs.umass.edu/irbook/)
 
Christopher D. Manning, Prabhakar Ragahavan, and Hinrich Schütze: Introduction to Information Retrieval, Cambridge University Press, 2008.
(Available online at: http://nlp.stanford.edu/IR-book/)
 
 
 


[updated 13.10.2024]
Module offered in:
WS 2020/21, WS 2019/20, WS 2018/19, SS 2018, SS 2017, ...
[Tue Jan 14 19:09:56 CET 2025, CKEY=kir, BKEY=pi, CID=PIBWI29, LANGUAGE=en, DATE=14.01.2025]