Title: Next Generation Computer-aided Drug Discovery for Small Molecules
Project duration: 05/21-05/25
Research Area: Life Science and Medicine
The field of computer-aided drug discovery is undergoing a transformation. With the availability of protein structures, expansive chemical spaces, and advancements in geometric deep learning, we are poised to make significant strides. However, several challenges remain.
We developed a novel method to screen ultra-large chemical spaces containing billions of molecules. Our approach, known as RosettaEvolutionaryLigand, successfully addressed this challenge. During development, we identified two key issues. First, the docking algorithm we used, while precise, is computationally expensive. Preliminary experiments showed that combining advanced diffusion methods with Monte Carlo sampling improves on both techniques. Second, existing affinity prediction methods lack precision, which is crucial for drug discovery but often overlooked by the machine learning community.
Developing new small molecule pharmaceuticals is challenging due to the vast chemical space. Computer calculations help preselect candidates, reducing the number of tests and iteration cycles needed to discover robust initial hits. However, established methods have significant shortcomings, and new machine learning methods often rely on artificial benchmarks due to limited domain knowledge.
We participated in the first round of the CACHE challenge, predicting a potential drug candidate for a protein associated with Parkinson’s disease. We discovered five promising hits from 145 submitted molecules. This demonstrated the potential of our evolutionary algorithm while also highlighting the limitations of our docking protocol and scoring function.
We use PyTorch for machine learning and the e3nn library for tensor product computations in molecular systems. RDKit prepares inputs for small molecules, while Rosetta handles large biomolecules. Our development takes place in C++ and Python, leveraging high-performance compute clusters with OpenMPI and strong GPUs with CUDA.
We plan to finalize state-of-the-art docking protocols and develop local geometry-aware convolution kernels for interaction predictions between molecules. Future work will focus on integrating these kernels into deep neural networks for affinity prediction. Additionally, we aim to create an affinity predictor that doesn’t rely on complex structures and develop a foundation model for molecular feature prediction that can adapt to multi-modal settings and specific tasks.