How computers can speed up drug discovery

Using Machine Learning (ML), researchers in Finland have been able to speed up a key procedure in the discovery of new types of medicines from four months to 10 days.

Drug discovery problems often resemble finding the exact placement of a piece in a jigsaw puzzle. However, instead of the standard 500 or maybe 1,000 pieces you will have millions to choose from, and the puzzle will be in 3D rather than 2D, greatly increasing complexity. Researchers at the University of Eastern Finland work with CSC, the national research and education network (NREN) of Finland, to make the task easier with the aid of ML.

Most drugs are small organic molecules designed to target larger organic molecules in the human body. The large molecule can for instance be an enzyme critical to pathogen survival, and the intended task of the small molecule would be to block the enzyme, leading to the destruction of the pathogen. To do this, the small molecule must be able to connect to the enzyme. In other words, the large molecule must have a so-called docking site for the small molecule. Just like a jigsaw puzzle piece needs to fit in.

The new ML-based techniques train computers to assist in predicting which of many small molecule candidates are likely to have the best fit, thereby limiting the need for more time-consuming screens and shortening development time.

A rigorous benchmarking exercise

Using computers in drug candidate screening is not new. Conventional computational docking would try to fit every single piece among hundreds of candidates into the existing jigsaw frame. How well shape and patterns match is translated into a “docking score”.  However, this brute-force approach, trying piece after piece one-by-one has recently been put in jeopardy by the size of available small molecule collections. With modern compound libraries exceeding the billion scale, docking would take months or years, even using a supercomputer.

This is where ML comes in. When presented with enough examples and their corresponding docking scores, the computer will be able to estimate how well new candidate molecules are likely to dock.

Since the ML-boosted predictions are not perfect – not yet at least – some candidate molecules could be wrongly disregarded. To address this issue, the group undertook a rigorous exercise: the ML predictions were benchmarked against brute-force docking results of 1.56 billion compounds against an anti-bacterial and an anti-viral target.

The benchmarking established the number of wrongly disregarded molecules by the ML-boosted method to be small: 90 % of the top-scoring compounds were found.

Dramatic reduction in computing time

Notably, the brute-force docking studies in the exercise took four months, even with cutting-edge software and use of supercomputing. The ML-boosted method needed just 10 days. According to the researchers, the advantage of such a dramatic reduction in development time would seem to justify the drawback of potentially missing out on a few candidate molecules. Especially, since the size of available molecule libraries continues to grow – as does competition for computing resources.

The group has released its entire 1.56 billion compound-docking results for two targets as benchmarking data into the public domain. Thereby, other researchers may use the dataset for the advancement of the field and further development of screening methods that save time and computing resources and ultimately contribute to better pharmaceutical treatments.

Learn more in the Journal of Chemica l Information and Modeling

The text is inspired by the article “How to solve a jigsaw puzzle with 1.56 billion pieces” by Ina Pöhner, Postdoc in the Molecular Modeling and Drug Design Research group at University of Eastern Finland, at the CSC website.

Published: 10/2023

For more information please contact our contributor(s):