Title: DeepESDL – Earth System Data Lab
Project duration: 2022 – 2025
Research Area: Applied ML, Big Data ML, Earth and Environmental Sciences
AI is becoming increasingly important in Earth observations as most parts of the Earth system are continuously monitored by sensors and AI is able to cope with both the volume of data and the heterogeneous data characteristics. For instance, satellites monitor the atmosphere, land, and ocean with unprecedented accuracy. In course of DeepESDL, the Earth System Data Lab (ESDL) capabilities have been extended to support the application of machine learning (ML) methods on Earth System Data Cubes (ESDC).
We provide three Python-based best practice Jupyter Notebooks based on a generic use case to showcase the implementation of state-of-the-art machine learning libraries on ESDCs in the DeepESDL Hub environment. Each Jupyter Notebook involves a self-contained workflow, markdown cells, comments and plots for user-friendly application and guidance and is based on one of the three well established open source ML libraries respectively:
Model tracking is realized through the usage of TensorBoard and mlflow. These tools offer science teams an easy-to-use platform allowing to run and scale their Machine Learning workloads in a collaborative environment supporting versioning and sharing of parameters, models, artefacts, results, etc. within the team and potentially external users. Mlflow supports the MLOps pipelines particularly to log and evaluate experiment runs as well as to store models in a registry. Persistent mlflow deployments are made available on team level to allow each team member to compare their experiments with those of the other team members and to use the trained models of others. TensorBoard as another collaborative tool in this MLOPs space is currently evaluated by the science teams and available as part of the TensorFlow conda kernel to individual users within their JupyterLab session.