Credit: VisualGeneration via photogenica.ru
As science grows more complex, successful researchers have to rely on more than experiments and calculations: AI is quickly becoming indispensable. Predicting the properties of molecules and materials for drug delivery, synthesizing compounds with set properties, and developing new materials – these are just a few of the tasks flawlessly accomplished by AI. For this article, we turned to ITMO’s Nikita Serov, an engineer at the Center for Artificial Intelligence in Chemistry, to talk about the doors opened by AI in natural sciences.
Ancient scientists described the world through experiment and observation, trying to spot patterns in the phenomena they saw.
As our knowledge of the world expanded, humans started identifying causal relations between events using the quantitative approach. This worked well for a while, but turning to manual calculations every time soon became too laborious – and that’s when scientists came up with computational modeling. Essentially, it allows them to describe real-world systems with precise computer simulations once they have outlined the rules that the system has to follow.
However, this approach isn’t flawless either. In actuality, natural phenomena are much more complex than we can imagine, which calls for approximations in simulations. Let’s take comparative psychology: usually, when studying primate behavior, researchers wouldn’t try to describe the whole species, turning instead to certain populations with specific habitats. This example illustrates one core issue of modern science: it focuses on narrow, very specific topics that are not easily extrapolated onto larger categories.
Thanks to big data, we can now solve this problem. With this approach, it doesn’t matter how complex the problem is or even what we choose to study – primates, molecules, or architecture – the models will remain more or less the same for any system. That’s why this method is universally applicable: having solved a specific issue in one field, researchers can use this data in many other, even unrelated, ones.
One of the trends in modern science is to move away from single experiments and towards developing tools for working with large quantities of generalized data.
Let’s imagine that you need to develop an industrial catalyst. You know that it has to be based on an iridium ion but you have no idea what other molecules it should be attached to – and there are hundreds of thousands of options. The classical experimental approach would imply manual search for compatible molecules that would take years for a whole team of researchers to complete. What’s more, each of them might produce slightly different data.
Now let’s look at automated big data-based systems that are cheap, can produce the result in mere weeks, and deliver far more reproducible data. They can evaluate over 1,000 catalysts and find the solution in weeks. And the by-product of their work? Having tested such a large amount of catalysts, researchers have accumulated a database containing information about the speed of chemical reactions that is important in developing a catalyst. Using this data, they identified how a catalyst’s structure affects its speed with a high degree of accuracy. This means that they can now theorize on the inner workings of the system and efficiently develop catalysts for other tasks, such as for new diagnostic systems.
Even more than that: these days, chemical labs boast robots who can replace humans over the course of experiments or industrial production. They can work 24/7, weighing samples, titrating solutions, synthesizing small natural molecules, and even suggesting and testing simple hypotheses. For instance, one such robot conducted around 700 experiments in just eight days, producing catalysts that are 15 times more efficient than those acquired with manual reagent testing. Such systems are unique because they don’t require specific instructions: they can improve their algorithms over the course of their work based on their own experience.
Another field where chemists already make use of AI is drug development. Usually, it takes 10–15 years to create a new product, but algorithms can speed this process up significantly.
Traditionally, the process goes like this: a chemist synthesizes a new molecule, which is then tested by biologists to find out whether it is effective against certain types of cancer. Then, scientists copy the original molecule many times, slightly tweaking its structure with each iteration. The goal is to find the most effective iteration.
Drugs are never tested on people right away – any prototype has to pass several stages of pre-clinical trials first. Sometimes, trials are conducted on samples of human cells that are kept in special cell banks. But there are disadvantages to using these, as many cultures are more than a hundred years old, and have changed over that time to look nothing like the original, or even any human cell. Therefore, if a medicine is successful when tested on these samples, it does not mean that it will be effective when taken by humans.
Scientists are working to solve this problem. For instance, a group of researchers from ETH Zurich, led by the chemist Francesca Grisoni, used artificial intelligence to create medical molecules to combat cancer. The AI already came up with more than 20 new drugs based on existing compounds. These prototypes proved to be 60 times more effective than the original medicine, which is considered to be an outstanding result. The prospective drugs were tested both via experimental trials and computer modeling. What is notable is that the algorithm was not taught to search for a specific type of medicine: it was simply given access to two data sets, one containing a million unknown molecules, and another that had two molecules that were effective against cancer. While the two cancer-repelling molecules were 50% and 20% effective respectively, the AI had to locate a molecule in the first data set that was 90–100% effective.
A similar method became the basis of the Pharma.AI platform that can locate disease sites, describe unknown molecules, and predict the results of clinical trials. It was already used to help create a new drug – a medicine for idiopathic pulmonary fibrosis by Insilico. The prototype has cleared the first phase of its clinical trials, and is being tested on humans right now. It is expected to soon enter the market. This is just one example of how scientists can apply digital technologies to the entirety of the drug development cycle: from synthesizing the compound to running trials on sample cells, mice, and humans.
Based on Nikita Serov’s lecture Why Chemists Need Data Science and AI, which took place at Planetarium No. 1 library this February.