Machine Learning on HPC

Title: Machine Learning on HPC – Introduction
Next Session: will be announced soon
Target Group: HPC Basics / HPC User
Language: English
Format: Online Tutorial

Due to the heterogeneity of Machine Learning applications, the motivation to switch to an HPC system can be manifold, e.g. due to large memory requirements, GPU usage or increase of computation speed. The course presents how a typical Machine Learning workflow can be realized in the HPC environment. It is possible to switch to the HPC system at different points in the workflow – depending on the requirements. The development of Machine Learning applications is often done by collaborative work within groups, which is also taken into account in the implementation of the Machine Learning workflow.

Course Details

Agenda

  • Access to the HPC system (e.g. ssh, Jupyterhub)
  • Data transfer and storage of training data, models, source codes etc. (e.g. scp, dtcp, user space, workspaces)
  • Setup of the required software environment (e.g. using module system, virtual environments, containers)
  • Execution/testing/debugging of applications  (e.g. batch jobs, interactive jobs)
  • Evaluation and storage of results
  • Simple monitoring to optimize applications (Pika)

Handouts

The course material (slides, sample application) will be available.

Prerequisites

Participants should have basic knowledge of Python as well as the use of Tensorflow or Pytorch or R.

Learning Outcomes

Participants will gain knowledge about the implementation of Machine Learning workflows using specific examples, taking into account individual requirements.

funded by:
Gefördert vom Bundesministerium für Bildung und Forschung.
Gefördert vom Freistaat Sachsen.