Home // Events // Trainings // Big Data Processing on HPC

Contact

Trainings

trainings.scads.ai@tu-dresden.de

Big Data Processing on HPC

Title: Big Data Processing on HPC
Next Session: 07.11.2024
Speaker: Apurv Deepak Kulkarni, Wenyu Zhang, Norman Koch
Target Group: Users who have a Big Data problem
Language: English
Format: Online Tutorial

Register for our tutorial “Big Data Processing on HPC” until 31.10.2024!

Apache Spark and Apache Flink are two typical Big Data analytics frameworks. Their APIs allow the development and testing of an application on a local workstation and later, without changing the source code of the application, distribute work to many computers when the local workstation is not sufficient anymore due to limited resources. The course Big Data Processing on HPC focuses on the step from a local workstation to an HPC environment and presents how the typical Big Data analysis workflow can be organized in an HPC environment. In this course participants will be introduced to running a data pipeline and data processing along with managing the configurations on the HPC environment, using Apache Flink and Apache Spark.

Agenda

Introduction
Distributed Computing with Big Data
HPC Considerations
1. Data Space
2. Software
3. Hardware
Big Data Framework Configuration
1. Master/Worker
2. Parallelism
3. Memory
Hands-On Session
Conclusion/Supplementary

Handouts

The course material (slides, sample application) will be available.

Prerequisites

It is recommended, that participants have basic knowledge of Big Data frameworks (e.g. Apache Flink, Apache Spark). Furthermore, basic HPC knowledge would be helpful, but is not required.

Learning Outcomes

Participants will be able to start and configure a Big Data cluster and run their own applications on HPC.

funded by:

ScaDS.AI Dresden/Leipzig (Center for Scalable Data Analytics and Artificial Intelligence) is a center for Data Science, Artificial Intelligence and Big Data with locations in Dresden and Leipzig.

Dresden

Visitor address Technische Universität Dresden
ScaDS.AI Dresden/Leipzig
Bürogebäude Strehlener Straße
Strehlener Straße 12, 14
01069 Dresden

Postal address Technische Universität Dresden
Zentrum für Informationsdienste und Hochleistungsrechnen
ScaDS.AI Dresden/Leipzig
01062 Dresden

Leipzig

Visitor address ScaDS.AI Dresden/Leipzig
Löhrs Carré
Humboldtstraße 25,
3. Obergeschoss
04105 Leipzig

Postal address Universität Leipzig
Data Science Zentrum
Internes Postfach: 212104
04081 Leipzig

About us

Research

Education

Transfer and Service

Living Lab