JavaScript is required to use this site. Please enable JavaScript in your browser settings.

Lecture Series

Behind the Secrets of Large Language Models (SecretLLM)

This course by Prof. Michael Färber and Prof. Simon Razniewski provides a practical and in-depth understanding of large language models that power modern natural language processing systems. Students will explore the architecture, training methodologies, capabilities, and ethical implications of LLMs. The course combines theoretical knowledge with hands-on experience to equip students with the skills necessary to develop, analyze, and apply LLMs in various contexts.

By the end of this course, students will be able to:

  1. Understand the architecture and key components of large language models.
  2. Analyze the training processes, including data collection, model optimization, and fine-tuning.
  3. Evaluate the performance and limitations of LLMs in different NLP tasks.
  4. Apply LLMs to real-world problems, such as text generation, summarization, and translation.
  5. Discuss the ethical considerations and societal impacts of deploying LLMs.

Prerequisites:

  • Introduction to Machine Learning or equivalent
  • Basic understanding of neural networks
  • Programming experience in Python

Potential topics for Komplexpraktikum etc.

The goal of this project is to leverage advanced large language models, such as those similar to ChatGPT, to translate website content from ScaDS.AI and other websites into plain or simplified language (see project Klartext). By providing this in addition to the existing German and English versions, the project aims to break down language barriers and enhance public engagement. This project significantly contributes to inclusion by enabling a broader audience, including non-experts and those with varying language proficiencies, to participate in scientific and cultural conversations.

What are the tasks?

    • Designing and implementing LLM-based methods for text simplification.

    • Evaluating these methods for use cases such as the translation of text into Easy/Plain Language.

What prerequisites do you need?

    • Strong interest in research on natural language processing, especially LLMs.

    • Good programming skills.

This topic is about making scientific texts more understandable. The goal is to automatically rewrite or translate academic articles so they become clear not only to field experts but also to researchers from other disciplines and interested laypeople. This role involves planning and conducting a user study. It’s a unique chance to actively engage in a project that could transform how we interact with scientific knowledge. You’ll gain experience in research methodology and user study design, directly contributing to making science more accessible to a broader audience.

What are the tasks?

    • Designing a comprehensive user study (e.g., select and preprocess the scientific texts for the participants).

    • Collecting and analyzing the response from the participants (e.g., who summarized or simplified scientific texts in their own words) so that the responses can be used as a ground truth (“perfect texts”) for AI models.

    • If you have experience with programming: Implement generative AI models (e.g., GPT-based) which can summarize or simplify scientific texts automatically.

What prerequisites do you need?

    • A passion for making scientific knowledge accessible to a broader audience.

    • Strong interest in research, user study design, and data analysis.

    • Good organizational skills to effectively design and manage a large user study (funded by the chair).

    • Basic programming skills.

 

This topic focuses on a collaboration with Orca Capital, a company specializing in financial markets. Together with Orca Capital, a Munich-based startup has developed a runing system that predicts the stock prices of certain companies based on a continuous stream of news, such as rising/falling prices and volatility. This system utilizes deep learning and natural language processing methods, including pretrained language models. The students will work on further developing and enhancing the system, using real-world financial data and industry contacts. Possible enhancements include applying the latest language models (LLMs) and techniques to make the predictions more explainable (explainable AI).

What are the tasks?

    • Developing extensions and improvements of the system, using the latest findings in deep learning and natural language processing.

    • Evaluating the system’s performance and making its predictions more interpretable, integrating methods from the field of explainable AI.

What prerequisites do you need?

  • Good programming skills in Python.

This topic is about advancing AI-based recommendation methods through the integration of large language models and graph message passing networks. The project aims to revolutionize how we predict and understand linkages within academic citation networks.

What are the tasks?

    • Implementing and testing algorithms for link prediction, community detection, node classification, and potentially other graph-supervised learning tasks.

    • Exploring the trade-offs between the utilization of textual and structural features in link prediction algorithms, and devising methods to efficiently combine these features.

What prerequisites do you need?

    • A strong interest in machine learning, natural language processing, or graph theory.

    • Proficiency in programming, preferably in Python, with experience in PyTorch or TensorFlow.

    • Eagerness to engage with state-of-the-art research in link prediction and text mining.

This topic is about working on SemOpenAlex, a comprehensive RDF knowledge graph that includes over 26 billion triples related to scientific publications, authors, institutions, journals, and more. This open-access initiative offers data through RDF dump files, a SPARQL endpoint, and the Linked Open Data cloud, enhancing the visibility and accessibility of scientific research.

What are the tasks?

    • Keeping SemOpenAlex up-to-date by updating its schema according to changes in the OpenAlex database and performing periodic updates to the RDF database.

    • Expanding SemOpenAlex, e.g., by introducing author name disambiguation, integrating representations of code repositories like GitHub, and linking to other databases and knowledge graphs such as LinkedPaperWithCode.com, Wikidata, and DBLP.

What prerequisites do you need?

    • Basic understanding of RDF and enthusiasm for semantic web and open data.

    • Programming skills in Python, which are critical for various tasks including database maintenance and development.

This project seeks to expand AutoRDF2GML, an open-source framework acclaimed for converting RDF data into specialized representations ideal for cutting-edge graph machine learning (GML) tasks, including graph neural networks (GNNs). With its automatic extraction of both content-based and topology-based features from RDF knowledge graphs, AutoRDF2GML simplifies the process for those new to RDF and SPARQL, making semantic web data more accessible and usable in real-world applications.

What are the tasks?

    • Adapt AutoRDF2GML to process a broader range of RDF knowledge graphs, allow a flexible integration of data sources from the Linked Open Data cloud.

    • Redesign the AutoRDF2GML interface to be more intuitive and user-friendly, enabling a seamless experience for both new and experienced users.

    • Boost the framework’s automation capabilities to simplify the setup and execution processes, making it easier to generate and use graph machine learning datasets efficiently.

What prerequisites do you need?

    • Proficiency in Python, with a foundational understanding of RDF, SPARQL, and graph machine learning concepts.

  • An enthusiastic interest in the intersection of semantic web technologies and deep learning.

Lecture Series 2020

Lecture at Leipzig University in the master’s program Data Science

Coordinator: Prof. Dr. E. Rahm (Leipzig University) 

The aim of the lecture series was to give participants an overview of current requirements and solutions for methods, technologies and applications of Artificial Intelligence and Big Data. The focus was on the areas worked on at ScaDS.AI Dresden/Leipzig. Speakers included Principal Investigators actively involved in ScaDS.AI Dresden/Leipzig. The lecture was offered as part of the module Current Trends in Data Science (5 LP) of the new study program Data Science. Languages of lecture were German and English. Successful completion of the module required watching video lectures as well as successfully solving a practical task in teams of two. The results of the practical tasks were presented by the students in the last two video lectures. Furthermore, participation in the lecture series was open to other students, researchers and interested parties. 

Due to the Corona development, the lecture was held with video presentations. The lecture materials could be viewed via the online platform Moodle.

Schedule

LectureLecturerContent
1Prof. Dr. Erhard RahmIntroduction to ScaDS.AI and lecture series/module, ScaDS.AI topics of database group (data integration for knowledge graphs, privacy-preserving data analysis, analysis of dynamic graph data)
2Prof. Dr. Stephanie SchiedermairDatenschutz und Diskriminierungsverbote als Herausforderungen für KI
3Dr. Sebastian HellmannRapid Prototyping of Large Knowledge Graphs and their Applications such as AI
4Prof. Dr. Martin BogdanWie weit ist es bis zur Singularität?
5Prof. Dr. Norbert SiegmundValidity and Fairness in Machine Learning: A Software Engineering Perspective
6Dr. Stefan Franke, Prof. Dr. T. NeumuthMöglichkeiten und Grenzen der KI in medizinischer Forschung und klinischem Alltag
7Dr. Ringo Baumann, Prof. Dr. Gerhard BrewkaComputational Models of Argumentation
8Prof. Dr. Peter StadlerVery Big Data in Computational Biology — Processing and Integration
9Prof. Dr. Nihat AyProf. Dr. Nihat Ay
10J.Prof. Dr. Martin PotthastTechnologies for Information Retrieval and Summarization
11Presentation of Results of Practical Exercises via Videoconference
12Presentation of Results of Practical Exercises via Videoconference

Lecture Series 2017

Joint lecture at TU Dresden and Leipzig University

Coordinators: Prof. Dr. S. Gumhold (TU Dresden), Prof. Dr. E. Rahm (Leipzig University

The aim of the lecture series was to give participants an overview of current requirements and solutions for Big Data technologies and applications. The focus was on the areas worked on in the Big Data competence center ScaDS Dresden/Leipzig. Speakers were professors actively involved in ScaDS Dresden/Leipzig. 

The lecture took place in blocks of 2 lectures (each about 1 h) alternating at Leipzig University (lecture hall 8) and at TU Dresden (Willersbau A317). All lectures were streamed via video to the other location on the same day and could be followed in the specified auditorium.

The lecture series was aimed at students of the bachelor’s and master’s programs in computer science, PhD students and all interested parties. The accounting modalities for students were regulated site-specifically according to the framework conditions of the respective study programs.

Schedule

The first named location provides video streaming. Since the lecture is held in German, the seminar schedule is also in German.

LecturerContent
Block 1: 27. April, 15:00: Universität Leipzig, Hörsaal 8; TU Dresden, Willersbau A317
Prof. RahmEinführung in die Ringvorlesung und ScaDS Dresden/Leipzig
Prof. RahmGraph-based Data Integration and Analysis for Big Data
Prof. ScheuermannMerkmalsbasierte visuelle Analyse großer wissenschaftlicher Daten
Vorstellung/Vergabe der praktischen Aufgaben
Block 2: 11. Mai 2017, 15:00: TU Dresden, Willersbau A317; Universität Leipzig, Hörsaal 8
Prof. SbalzariniThe PPML language for distributed scalable processing enables real-time segmentation of large image data
Prof. LehnerNext-Generation Hardware for Data Management – more a Blessing than a Curse?
Vorstellung/Vergabe der praktischen Arbeiten
Block 3: 18. Mai 2017, 15:00: Universität Leipzig, Hörsaal 8; TU Dresden, Willersbau A317
Prof. StadlerGenome Annotation in the Age of Big Data
Prof. HeyerBig Data in den Digital Humanities?
Block 4: 1. Juni 2017, 15:00: TU Dresden, Willersbau A317, Universität Leipzig, Hörsaal 8
Prof. NagelBig Data and HPC – Two worlds apart or common future?
Dr. BussmannBig Data in Photon Science: Why we do everything once
Block 5: 22. Juni 2017, 15:00: Universität Leipzig, Hörsaal 8; TU Dresden, Willersbau A317
Prof. BogdanVerbesserung der Sicherheit von Virtuellen Maschinen für Big Data Architekturen
Prof. FranczykProzesse treffen Big Data – Verbindung zwischen Data Science und Prozess Science
Block 6: 29. Juni 2017, 15:00: TU Dresden, Willersbau A317; Universität Leipzig, Hörsaal 8
Prof. GumholdScalable Visualization
Prof. DachseltMultimodal Exploration of Large Data Sets
funded by:
Gefördert vom Bundesministerium für Bildung und Forschung.
Gefördert vom Freistaat Sachsen.