Home // Education // Lecture Series

Lecture Series

Behind the Secrets of Large Language Models (SecretLLM)

This course by Prof. Michael Färber and Prof. Simon Razniewski provides a practical and in-depth understanding of large language models that power modern natural language processing systems. Students will explore the architecture, training methodologies, capabilities, and ethical implications of LLMs. The course combines theoretical knowledge with hands-on experience to equip students with the skills necessary to develop, analyze, and apply LLMs in various contexts.

By the end of this course, students will be able to:

Understand the architecture and key components of large language models.
Analyze the training processes, including data collection, model optimization, and fine-tuning.
Evaluate the performance and limitations of LLMs in different NLP tasks.
Apply LLMs to real-world problems, such as text generation, summarization, and translation.
Discuss the ethical considerations and societal impacts of deploying LLMs.

Prerequisites:

Introduction to Machine Learning or equivalent
Basic understanding of neural networks
Programming experience in Python

Potential topics for Komplexpraktikum etc.

The goal of this project is to leverage advanced large language models, such as those similar to ChatGPT, to translate website content from ScaDS.AI and other websites into plain or simplified language (see project Klartext). By providing this in addition to the existing German and English versions, the project aims to break down language barriers and enhance public engagement. This project significantly contributes to inclusion by enabling a broader audience, including non-experts and those with varying language proficiencies, to participate in scientific and cultural conversations.

What are the tasks?

- Designing and implementing LLM-based methods for text simplification.

- Evaluating these methods for use cases such as the translation of text into Easy/Plain Language.

What prerequisites do you need?

- Strong interest in research on natural language processing, especially LLMs.

- Good programming skills.

This topic is about making scientific texts more understandable. The goal is to automatically rewrite or translate academic articles so they become clear not only to field experts but also to researchers from other disciplines and interested laypeople. This role involves planning and conducting a user study. It’s a unique chance to actively engage in a project that could transform how we interact with scientific knowledge. You’ll gain experience in research methodology and user study design, directly contributing to making science more accessible to a broader audience.

What are the tasks?

- Designing a comprehensive user study (e.g., select and preprocess the scientific texts for the participants).

- Collecting and analyzing the response from the participants (e.g., who summarized or simplified scientific texts in their own words) so that the responses can be used as a ground truth (“perfect texts”) for AI models.

- If you have experience with programming: Implement generative AI models (e.g., GPT-based) which can summarize or simplify scientific texts automatically.

What prerequisites do you need?

- A passion for making scientific knowledge accessible to a broader audience.

- Strong interest in research, user study design, and data analysis.

- Good organizational skills to effectively design and manage a large user study (funded by the chair).

- Basic programming skills.

This topic focuses on a collaboration with Orca Capital, a company specializing in financial markets. Together with Orca Capital, a Munich-based startup has developed a runing system that predicts the stock prices of certain companies based on a continuous stream of news, such as rising/falling prices and volatility. This system utilizes deep learning and natural language processing methods, including pretrained language models. The students will work on further developing and enhancing the system, using real-world financial data and industry contacts. Possible enhancements include applying the latest language models (LLMs) and techniques to make the predictions more explainable (explainable AI).

What are the tasks?

- Developing extensions and improvements of the system, using the latest findings in deep learning and natural language processing.

- Evaluating the system’s performance and making its predictions more interpretable, integrating methods from the field of explainable AI.

What prerequisites do you need?

Good programming skills in Python.

This topic is about advancing AI-based recommendation methods through the integration of large language models and graph message passing networks. The project aims to revolutionize how we predict and understand linkages within academic citation networks.

What are the tasks?

- Implementing and testing algorithms for link prediction, community detection, node classification, and potentially other graph-supervised learning tasks.

- Exploring the trade-offs between the utilization of textual and structural features in link prediction algorithms, and devising methods to efficiently combine these features.

What prerequisites do you need?

- A strong interest in machine learning, natural language processing, or graph theory.

- Proficiency in programming, preferably in Python, with experience in PyTorch or TensorFlow.

- Eagerness to engage with state-of-the-art research in link prediction and text mining.

This topic is about working on SemOpenAlex, a comprehensive RDF knowledge graph that includes over 26 billion triples related to scientific publications, authors, institutions, journals, and more. This open-access initiative offers data through RDF dump files, a SPARQL endpoint, and the Linked Open Data cloud, enhancing the visibility and accessibility of scientific research.

What are the tasks?

- Keeping SemOpenAlex up-to-date by updating its schema according to changes in the OpenAlex database and performing periodic updates to the RDF database.

- Expanding SemOpenAlex, e.g., by introducing author name disambiguation, integrating representations of code repositories like GitHub, and linking to other databases and knowledge graphs such as LinkedPaperWithCode.com, Wikidata, and DBLP.

What prerequisites do you need?

- Basic understanding of RDF and enthusiasm for semantic web and open data.

- Programming skills in Python, which are critical for various tasks including database maintenance and development.

This project seeks to expand AutoRDF2GML, an open-source framework acclaimed for converting RDF data into specialized representations ideal for cutting-edge graph machine learning (GML) tasks, including graph neural networks (GNNs). With its automatic extraction of both content-based and topology-based features from RDF knowledge graphs, AutoRDF2GML simplifies the process for those new to RDF and SPARQL, making semantic web data more accessible and usable in real-world applications.

What are the tasks?

- Adapt AutoRDF2GML to process a broader range of RDF knowledge graphs, allow a flexible integration of data sources from the Linked Open Data cloud.

- Redesign the AutoRDF2GML interface to be more intuitive and user-friendly, enabling a seamless experience for both new and experienced users.

- Boost the framework’s automation capabilities to simplify the setup and execution processes, making it easier to generate and use graph machine learning datasets efficiently.

What prerequisites do you need?

- Proficiency in Python, with a foundational understanding of RDF, SPARQL, and graph machine learning concepts.

An enthusiastic interest in the intersection of semantic web technologies and deep learning.

Lecture Series 2020

Lecture at Leipzig University in the master’s program Data Science

Coordinator: Prof. Dr. E. Rahm (Leipzig University)

The aim of the lecture series was to give participants an overview of current requirements and solutions for methods, technologies and applications of Artificial Intelligence and Big Data. The focus was on the areas worked on at ScaDS.AI Dresden/Leipzig. Speakers included Principal Investigators actively involved in ScaDS.AI Dresden/Leipzig. The lecture was offered as part of the module Current Trends in Data Science (5 LP) of the new study program Data Science. Languages of lecture were German and English. Successful completion of the module required watching video lectures as well as successfully solving a practical task in teams of two. The results of the practical tasks were presented by the students in the last two video lectures. Furthermore, participation in the lecture series was open to other students, researchers and interested parties.

Due to the Corona development, the lecture was held with video presentations. The lecture materials could be viewed via the online platform Moodle.

Schedule

Lecture	Lecturer	Content
1	Prof. Dr. Erhard Rahm	Introduction to ScaDS.AI and lecture series/module, ScaDS.AI topics of database group (data integration for knowledge graphs, privacy-preserving data analysis, analysis of dynamic graph data)
2	Prof. Dr. Stephanie Schiedermair	Datenschutz und Diskriminierungsverbote als Herausforderungen für KI
3	Dr. Sebastian Hellmann	Rapid Prototyping of Large Knowledge Graphs and their Applications such as AI
4	Prof. Dr. Martin Bogdan	Wie weit ist es bis zur Singularität?
5	Prof. Dr. Norbert Siegmund	Validity and Fairness in Machine Learning: A Software Engineering Perspective
6	Dr. Stefan Franke, Prof. Dr. T. Neumuth	Möglichkeiten und Grenzen der KI in medizinischer Forschung und klinischem Alltag
7	Dr. Ringo Baumann, Prof. Dr. Gerhard Brewka	Computational Models of Argumentation
8	Prof. Dr. Peter Stadler	Very Big Data in Computational Biology — Processing and Integration
9	Prof. Dr. Nihat Ay	Prof. Dr. Nihat Ay
10	J.Prof. Dr. Martin Potthast	Technologies for Information Retrieval and Summarization
11		Presentation of Results of Practical Exercises via Videoconference
12		Presentation of Results of Practical Exercises via Videoconference

Lecture Series 2017

Joint lecture at TU Dresden and Leipzig University

Coordinators: Prof. Dr. S. Gumhold (TU Dresden), Prof. Dr. E. Rahm (Leipzig University)

The aim of the lecture series was to give participants an overview of current requirements and solutions for Big Data technologies and applications. The focus was on the areas worked on in the Big Data competence center ScaDS Dresden/Leipzig. Speakers were professors actively involved in ScaDS Dresden/Leipzig.

The lecture took place in blocks of 2 lectures (each about 1 h) alternating at Leipzig University (lecture hall 8) and at TU Dresden (Willersbau A317). All lectures were streamed via video to the other location on the same day and could be followed in the specified auditorium.

The lecture series was aimed at students of the bachelor’s and master’s programs in computer science, PhD students and all interested parties. The accounting modalities for students were regulated site-specifically according to the framework conditions of the respective study programs.

Schedule

The first named location provides video streaming. Since the lecture is held in German, the seminar schedule is also in German.

Lecturer	Content
Block 1: 27. April, 15:00: Universität Leipzig, Hörsaal 8; TU Dresden, Willersbau A317
Prof. Rahm	Einführung in die Ringvorlesung und ScaDS Dresden/Leipzig
Prof. Rahm	Graph-based Data Integration and Analysis for Big Data
Prof. Scheuermann	Merkmalsbasierte visuelle Analyse großer wissenschaftlicher Daten
	Vorstellung/Vergabe der praktischen Aufgaben
Block 2: 11. Mai 2017, 15:00: TU Dresden, Willersbau A317; Universität Leipzig, Hörsaal 8
Prof. Sbalzarini	The PPML language for distributed scalable processing enables real-time segmentation of large image data
Prof. Lehner	Next-Generation Hardware for Data Management – more a Blessing than a Curse?
	Vorstellung/Vergabe der praktischen Arbeiten
Block 3: 18. Mai 2017, 15:00: Universität Leipzig, Hörsaal 8; TU Dresden, Willersbau A317
Prof. Stadler	Genome Annotation in the Age of Big Data
Prof. Heyer	Big Data in den Digital Humanities?
Block 4: 1. Juni 2017, 15:00: TU Dresden, Willersbau A317, Universität Leipzig, Hörsaal 8
Prof. Nagel	Big Data and HPC – Two worlds apart or common future?
Dr. Bussmann	Big Data in Photon Science: Why we do everything once
Block 5: 22. Juni 2017, 15:00: Universität Leipzig, Hörsaal 8; TU Dresden, Willersbau A317
Prof. Bogdan	Verbesserung der Sicherheit von Virtuellen Maschinen für Big Data Architekturen
Prof. Franczyk	Prozesse treffen Big Data – Verbindung zwischen Data Science und Prozess Science
Block 6: 29. Juni 2017, 15:00: TU Dresden, Willersbau A317; Universität Leipzig, Hörsaal 8
Prof. Gumhold	Scalable Visualization
Prof. Dachselt	Multimodal Exploration of Large Data Sets

funded by:

Gefördert vom Bundesministerium für Bildung und Forschung.

ScaDS.AI Dresden/Leipzig (Center for Scalable Data Analytics and Artificial Intelligence) is a center for Data Science, Artificial Intelligence and Big Data with locations in Dresden and Leipzig.

Dresden

Visitor address Technische Universität Dresden
ScaDS.AI Dresden/Leipzig
Bürogebäude Strehlener Straße
Strehlener Straße 12, 14
01069 Dresden

Postal address Technische Universität Dresden
Zentrum für Informationsdienste und Hochleistungsrechnen
ScaDS.AI Dresden/Leipzig
01062 Dresden

Leipzig

Visitor address ScaDS.AI Dresden/Leipzig
Löhrs Carré
Humboldtstraße 25, Uferstr. 11
04105 Leipzig

Postal address Universität Leipzig
Data Science Zentrum
Internes Postfach: 212104
04081 Leipzig

Quicklinks:

Accessibility

Imprint

Privacy

About us

Research

Education

Transfer and Service

Living Lab

Lecture Series

Behind the Secrets of Large Language Models (SecretLLM)

Potential topics for Komplexpraktikum etc.

What are the tasks?

What prerequisites do you need?

What are the tasks?

What prerequisites do you need?

What are the tasks?

What prerequisites do you need?

What are the tasks?

What prerequisites do you need?

What are the tasks?

What prerequisites do you need?

What are the tasks?

What prerequisites do you need?

Lecture Series 2020

Schedule

Lecture Series 2017

Schedule

Dresden

Leipzig

Quicklinks:

Accessibility

Imprint

Privacy

About us

Research

Education

Transfer and Service

Living Lab

Lecture Series

Behind the Secrets of Large Language Models (SecretLLM)

Potential topics for Komplexpraktikum etc.

0: Developing LLM-based Methods for Text Simplification

What are the tasks?

What prerequisites do you need?

1: Designing and Executing a Large-Scale User Study on Scientific Text Simplification

What are the tasks?

What prerequisites do you need?

2: Stock Market Predictions through Deep Learning

What are the tasks?

What prerequisites do you need?

3: Large Language Model-enhanced Graph Message Passing Network for Link Prediction

What are the tasks?

What prerequisites do you need?

4: Extending the RDF Knowledge Graph SemOpenAlex.org

What are the tasks?

What prerequisites do you need?

5: Fusing RDF Knowledge Graphs with Deep Learning for Advanced Recommender Systems

What are the tasks?

What prerequisites do you need?

Lecture Series 2020

Schedule

Lecture Series 2017

Schedule

Dresden

Leipzig

Quicklinks:

Accessibility

Imprint

Privacy