Title: Experimental and Computational High-Throughput Characterization of Protein-Protein Interactions
Project duration: 3 years
Research Area: Computational Biology
We develop a new approach to measure and predict protein-protein Interactions using machine learning and artificial intelligence. These interactions form the biological foundation of nearly all mechanisms essential to life. Faulty or missing interactions often decide between health and disease. Recent advances in machine learning and artificial intelligence have opened up new possibilities for characterizing protein-protein interactions using AI. This is in part to ultra-large Language models that can translate protein sequences into a biologically meaningful vector space. This in combination with a new high-throughput method to acquire protein binding affinity data will allow us to drastically accelerate the design of vaccines and other protein therapeutics, which are at the core of modern drug development for cancer, neuro-degenerative diseases and viral infections.
We aim to develop an in-silico high-throughput predictor for protein-protein binding affinities by leveraging various Protein embedding techniques building on Deep Learning architectures such as Convolutional Neural Networks and Transformers. This will enable:
On the one hand, there is a lack of PPI data as the methods to obtain this data are time consuming, costly and only measure a very limited number of data points at a time and there is no incentive for generating vast numbers of them using conventional methods. This results on the other hand in the non-existence of appropriate models that can accurately predict binding affinities, which in turn is part of the reason that there is no real incentive to measure thousands of random PPIs.
Our approach has already been employed on Vaccine candidates for Sars-CoV-2 and the Hepatitis C virus.
We use models such as ESM2, CARP, ProtTrans and others to embed protein sequences into a biologically meaningful space. Predicting binding affinity within this space is then carried out by custom AI architectures. In the Wet Lab we employ a combination of well-established methods like Yeast Surface Display, Fluorescence activated Cell sorting and Next Generation Sequencing to generate the needed data for our AI-driven approach to PPI prediction.
The prediction of protein-protein interactions is required for a variety of biological problems. Consequently, the developed methods will be transferable to several other application areas beyond the field of vaccine development. Considering the recent Sars-CoV-2 pandemic, the success of this project will have the potential to increase the quality and speed of future vaccine development.