Title: Learning Spatiotemporal Models from Limited Noisy Data
Duration: January 2021 – December 2025
Research Area: Applied Data Science and AI – Life Science and Medicine
Data in the life sciences are typically sparse (few samples in high-dimensional spaces) and noisy (measurement uncertainty and intrinsic biological variability). In the project “Learning Spatiotemporal Models from Limited Noisy Data”, we enable the use of machine learning to infer interpretable, symbolic mathematical models of biomedical dynamics in space and time from sparse (a few hundred samples) noisy (up to 30% noise) data. The resulting equation models can then be analyzed using standard tools from mathematics in order to gain insight about the stability, bifurcations, and physical processes at play in the observed biomedical process.
Learning interpretable mathematical models from observational data has received a lot of research attention in the past 10 years. The main problems with previous methods, however, were that they were sensitive to noise in the data, required a lot of data, or had user-tunable parameters which could be set to obtain almost any result one wanted. Together, these three limitations have so far prevented the use of these methods in the life sciences, where the true model is unknown, and data are sparse and noisy.
In a proof of concept, preliminary results from this project were used to automatically infer the molecular protein interaction network during early embryo polarization of C. elegans from a single microscopy video (see Maddu et al., Proc. R. Soc. A., 2022). The interactions inferred from the video were identical to published ones found in biochemical experiments.
Going forward, the project “Learning Spatiotemporal Models from Limited Noisy Data” will expand into classifying different models according to their high-dimensional loss landscape or their phase space. This touches upon the question of when (and in what sense) two dynamical models are equivalent or “similar”. To approach this question, we will use concepts from topological data analysis. The project will also extend towards learning mutual spatial arrangements of objects, for which we will use determinantal point processes and investigate how they can be stably identified from limited and noisy measurement data.