Home // Research // Applied AI and Big Data // Physics and Chemistry // Projects // Machine Learning for the Recognition of Molecules, Metabolites, and Chemical Reactions

Contact

Prof. Dr. Peter Stadler

Chair of Bioinformatics

Leipzig University

Peter.Stadler@bioinf.uni-leipzig.de

Machine Learning for the Recognition of Molecules, Metabolites, and Chemical Reactions

Title: Machine Learning for the Recognition of Molecules, Metabolites, and Chemical Reactions

Duration: 3 years

Research Area: Mathematical Foundations: Discrete Mathematics and Cheminformatics

When Machine Learning methods are used to produce molecular candidates in areas like Drug Discovery, these can routinely report back unrealistic molecules as their suggestions. We consider this to be related to a fundamental theoretical problem, specifically, to the question of how to determine if an arbitrary molecular candidate-graph can, in fact, model a real molecule or if this is mathematically impossible for it. Given the complexity of the currently known chemical universe, one can see that this is not a simple statistical issue, and that it requires the use of Machine Learning techniques. Here, we propose the use of the Graph Transformation method known as Double Pushout Approach to work as the basis for such a classification system. We study its applications as a novel approach for the in silico construction and exploration of chemical spaces.

Aims

We design our Machine Learning algorithms within the paradigm of Explainable Artificial Intelligence, seeking for these to be able to report back in an explicit human-understandable way the aspects that they learn from data. We seek to develop an integrated study on the promising capability of the Double Pushout approach and the Explainable Machine Learning methods.

Problem

Two fields with important biochemical applications are Retrosynthetic Analysis and Drug Discovery. Both areas have recently seen benefits from including Machine Learning algorithms into their tools for in silico analysis. One thing that most of these Machine Learning improvements have in common is that they make use of Graph Theory to build computational representations of the candidate molecules they produce, which in principle may not depict real molecules.

To obtain said candidates, these methods rely on training over sets of real molecules retrieved from databases, expecting for this aspect alone to ensure the viability of their suggestions. But in doing so, they ignore one of various fundamental problems that we want to address, namely, how to determine if an arbitrary chemical graph can actually be synthesized, or whether it can only be an unfeasible abstract construct. We consider that at the core of this issue lies the combinatorial constraints that Chemistry naturally imposes over the candidate graphs.

Practical example created during the project (if applicable)

We have reached two main milestones of our project as of today. One of these is the study and Characterization of the Equivalence Between Atom-to-Atom Maps [1, 2], which allows us to determine sets of mathematically “well-behaved” reactions. With this, we can then produce the “alignment of graphs” associated to such reactions. We have implemented, in Python language, a proof of concept of the process of the Graph Alignment, which we describe in a manuscript recently submitted for revision [3, 4]. Such an implementation would be extended to C++ to study cases of greater complexity.

Animation. Project "Machine Learning for the Recognition of Molecules, Metabolites, and Chemical Reactions".

Technology

All the computational implementations of our methods are done using the C++ and Python languages and the open-source libraries available for them. All programs produced for this project are made public in Github repositories, containing examples and instructions on how to execute them.

Outlook

The applications of our project consist in the automatic inference of Graph Transformation Rules and the exploration of Chemical Spaces with the inferred rules, see [5] for details.

Publications

[1] M. E. González Laffitte, N. Beier, N. Domschke, P. F. Stadler, Comparison of Atom Maps. MATCH Commun. Math. Comput. Chem. 90 (2023) 75–102 https://match.pmf.kg.ac.rs/issues/m90n1/m90n1_75-102.html
[2] Github Repository: https://github.com/MarcosLaffitte/EEquAAM
[3] M. E. González Laffitte and P. F. Stadler. Progressive Multiple Alignment of Graphs.
[4] Github Repository:https://github.com/MarcosLaffitte/Progralign
[5] Jakob L. Andersen et al. An intermediate level of abstraction for computational systems chemistry. Phil. Trans. R. Soc. A.375. https://royalsocietypublishing.org/doi/10.1098/rsta.2016.0354

Team

Lead

Prof. Dr. Peter Stadler

Team Members

Marcos Emmanuel Gonzalez Laffitte
Tomas Gatter

Partners

Nora Beier
Nico Domschke
Maria Waldl
Klaus Weinbauer
Tieu-Long Phan
Bioinformatics Group
Inst.f.Informatik, Leipzig University

funded by:

Gefördert vom Bundesministerium für Bildung und Forschung.

ScaDS.AI Dresden/Leipzig (Center for Scalable Data Analytics and Artificial Intelligence) is a center for Data Science, Artificial Intelligence and Big Data with locations in Dresden and Leipzig.