This course by Prof. Michael Färber and Prof. Simon Razniewski provides a practical and in-depth understanding of large language models that power modern natural language processing systems. Students will explore the architecture, training methodologies, capabilities, and ethical implications of LLMs. The course combines theoretical knowledge with hands-on experience to equip students with the skills necessary to develop, analyze, and apply LLMs in various contexts.
By the end of this course, students will be able to:
Prerequisites:
The goal of this project is to leverage advanced large language models, such as those similar to ChatGPT, to translate website content from ScaDS.AI and other websites into plain or simplified language (see project Klartext). By providing this in addition to the existing German and English versions, the project aims to break down language barriers and enhance public engagement. This project significantly contributes to inclusion by enabling a broader audience, including non-experts and those with varying language proficiencies, to participate in scientific and cultural conversations.
This topic is about making scientific texts more understandable. The goal is to automatically rewrite or translate academic articles so they become clear not only to field experts but also to researchers from other disciplines and interested laypeople. This role involves planning and conducting a user study. It’s a unique chance to actively engage in a project that could transform how we interact with scientific knowledge. You’ll gain experience in research methodology and user study design, directly contributing to making science more accessible to a broader audience.
This topic focuses on a collaboration with Orca Capital, a company specializing in financial markets. Together with Orca Capital, a Munich-based startup has developed a runing system that predicts the stock prices of certain companies based on a continuous stream of news, such as rising/falling prices and volatility. This system utilizes deep learning and natural language processing methods, including pretrained language models. The students will work on further developing and enhancing the system, using real-world financial data and industry contacts. Possible enhancements include applying the latest language models (LLMs) and techniques to make the predictions more explainable (explainable AI).
This topic is about advancing AI-based recommendation methods through the integration of large language models and graph message passing networks. The project aims to revolutionize how we predict and understand linkages within academic citation networks.
This topic is about working on SemOpenAlex, a comprehensive RDF knowledge graph that includes over 26 billion triples related to scientific publications, authors, institutions, journals, and more. This open-access initiative offers data through RDF dump files, a SPARQL endpoint, and the Linked Open Data cloud, enhancing the visibility and accessibility of scientific research.
This project seeks to expand AutoRDF2GML, an open-source framework acclaimed for converting RDF data into specialized representations ideal for cutting-edge graph machine learning (GML) tasks, including graph neural networks (GNNs). With its automatic extraction of both content-based and topology-based features from RDF knowledge graphs, AutoRDF2GML simplifies the process for those new to RDF and SPARQL, making semantic web data more accessible and usable in real-world applications.
Lecture at Leipzig University in the master’s program Data Science
Coordinator: Prof. Dr. E. Rahm (Leipzig University)
The aim of the lecture series was to give participants an overview of current requirements and solutions for methods, technologies and applications of Artificial Intelligence and Big Data. The focus was on the areas worked on at ScaDS.AI Dresden/Leipzig. Speakers included Principal Investigators actively involved in ScaDS.AI Dresden/Leipzig. The lecture was offered as part of the module Current Trends in Data Science (5 LP) of the new study program Data Science. Languages of lecture were German and English. Successful completion of the module required watching video lectures as well as successfully solving a practical task in teams of two. The results of the practical tasks were presented by the students in the last two video lectures. Furthermore, participation in the lecture series was open to other students, researchers and interested parties.
Due to the Corona development, the lecture was held with video presentations. The lecture materials could be viewed via the online platform Moodle.
Lecture | Lecturer | Content |
---|---|---|
1 | Prof. Dr. Erhard Rahm | Introduction to ScaDS.AI and lecture series/module, ScaDS.AI topics of database group (data integration for knowledge graphs, privacy-preserving data analysis, analysis of dynamic graph data) |
2 | Prof. Dr. Stephanie Schiedermair | Datenschutz und Diskriminierungsverbote als Herausforderungen für KI |
3 | Dr. Sebastian Hellmann | Rapid Prototyping of Large Knowledge Graphs and their Applications such as AI |
4 | Prof. Dr. Martin Bogdan | Wie weit ist es bis zur Singularität? |
5 | Prof. Dr. Norbert Siegmund | Validity and Fairness in Machine Learning: A Software Engineering Perspective |
6 | Dr. Stefan Franke, Prof. Dr. T. Neumuth | Möglichkeiten und Grenzen der KI in medizinischer Forschung und klinischem Alltag |
7 | Dr. Ringo Baumann, Prof. Dr. Gerhard Brewka | Computational Models of Argumentation |
8 | Prof. Dr. Peter Stadler | Very Big Data in Computational Biology — Processing and Integration |
9 | Prof. Dr. Nihat Ay | Prof. Dr. Nihat Ay |
10 | J.Prof. Dr. Martin Potthast | Technologies for Information Retrieval and Summarization |
11 | Presentation of Results of Practical Exercises via Videoconference | |
12 | Presentation of Results of Practical Exercises via Videoconference |
Joint lecture at TU Dresden and Leipzig University
Coordinators: Prof. Dr. S. Gumhold (TU Dresden), Prof. Dr. E. Rahm (Leipzig University)
The aim of the lecture series was to give participants an overview of current requirements and solutions for Big Data technologies and applications. The focus was on the areas worked on in the Big Data competence center ScaDS Dresden/Leipzig. Speakers were professors actively involved in ScaDS Dresden/Leipzig.
The lecture took place in blocks of 2 lectures (each about 1 h) alternating at Leipzig University (lecture hall 8) and at TU Dresden (Willersbau A317). All lectures were streamed via video to the other location on the same day and could be followed in the specified auditorium.
The lecture series was aimed at students of the bachelor’s and master’s programs in computer science, PhD students and all interested parties. The accounting modalities for students were regulated site-specifically according to the framework conditions of the respective study programs.
The first named location provides video streaming. Since the lecture is held in German, the seminar schedule is also in German.
Lecturer | Content |
---|---|
Block 1: 27. April, 15:00: Universität Leipzig, Hörsaal 8; TU Dresden, Willersbau A317 | |
Prof. Rahm | Einführung in die Ringvorlesung und ScaDS Dresden/Leipzig |
Prof. Rahm | Graph-based Data Integration and Analysis for Big Data |
Prof. Scheuermann | Merkmalsbasierte visuelle Analyse großer wissenschaftlicher Daten |
Vorstellung/Vergabe der praktischen Aufgaben | |
Block 2: 11. Mai 2017, 15:00: TU Dresden, Willersbau A317; Universität Leipzig, Hörsaal 8 | |
Prof. Sbalzarini | The PPML language for distributed scalable processing enables real-time segmentation of large image data |
Prof. Lehner | Next-Generation Hardware for Data Management – more a Blessing than a Curse? |
Vorstellung/Vergabe der praktischen Arbeiten | |
Block 3: 18. Mai 2017, 15:00: Universität Leipzig, Hörsaal 8; TU Dresden, Willersbau A317 | |
Prof. Stadler | Genome Annotation in the Age of Big Data |
Prof. Heyer | Big Data in den Digital Humanities? |
Block 4: 1. Juni 2017, 15:00: TU Dresden, Willersbau A317, Universität Leipzig, Hörsaal 8 | |
Prof. Nagel | Big Data and HPC – Two worlds apart or common future? |
Dr. Bussmann | Big Data in Photon Science: Why we do everything once |
Block 5: 22. Juni 2017, 15:00: Universität Leipzig, Hörsaal 8; TU Dresden, Willersbau A317 | |
Prof. Bogdan | Verbesserung der Sicherheit von Virtuellen Maschinen für Big Data Architekturen |
Prof. Franczyk | Prozesse treffen Big Data – Verbindung zwischen Data Science und Prozess Science |
Block 6: 29. Juni 2017, 15:00: TU Dresden, Willersbau A317; Universität Leipzig, Hörsaal 8 | |
Prof. Gumhold | Scalable Visualization |
Prof. Dachselt | Multimodal Exploration of Large Data Sets |