ScaDS.AI - Center for scalable data analytics and artificial intelligence

The concept of the Earth System Data Cube (Mahecha et al. 2020) rapidly turned into a popular tool in Earth System Sciences during the last years as it tremendously facilitates data visualization and (interoperable) data handling, including preprocessing or statistical analyses. The original data sets are transformed in space and time to fit to the common grid of the Data Cube which consists of three dimensions: longitude, latitude and time, and further holds a set of variables that are mapped into this spatio-temporal system. Data Cubes are typically chunked, meaning they consist of a set of smaller cubes (chunks) which together build what we call the Earth System Data Cube (ESDC). The ESDC concept allows to treat multiple remotely sensed spatio-temporal data streams as a singular one and therefore enables to interact with a wide range of data.

A parallel development is the growing need for the application of Machine Learning methods to Earth System Sciences data as most parts of the Earth system are continuously monitored by sensors and Machine Learning is able to cope with both the volume of data and the heterogeneous data characteristics. Ideally, classical operations on the ESDC could be extended by Machine Learning applications in order to sustain interoperability. However, there is a conflict between the nature of remotely-sensed data, the structure of the ESDC and the requirements for meaningful Machine Learning applications which need to be addressed:

Sampling the Earth naturally leads to an uneven distribution of data points as a result of its spherical shape. This phenomenon is reinforced by data gaps due to e.g., satellite trajectories or cloud cover. Hence, there is no uniform data distribution across the chunks of the ESDC provided.
Remotely sensed data tends to be auto-correlated within (neighboring) chunks as data points which are in close spatio-temporal vicinity are naturally characterized by a low variance.

Therefore, it is mandatory to enable Machine Learning that respects the basic principles of geo-data way beyond naive applications of Machine Learning in the Earth system context. We focus on the development of sophisticated and efficient sampling strategies for Data Cubes and ML tools that can operate on this large cloud-hosted data sets.

More to it:

DeepESDL (a joint project with the European Space Agency)

Earth System Data Cubes

Quicklinks

Standorte

Dresden

Leipzig

Gefördert durch