Machine Learning X Drug Discovery : The Nexus of Computational Biology

Aaron Lewis
The Startup
Published in
5 min readMay 18, 2020
Unsplash

On average it takes 10–20 years to develop a drug and costs 0.5 to 2.6 billion dollars 🤯. Drugs that do pass the pre-clinical trials phase have a success rate in clinical trials of only 6.2%. That is the worst part. Companies and institutions invest all this time and capital into a drug only for it to fail in phase III trials.

This is the current outline of how drug development works.

In the context of aging molecules this process can take even longer than shown in the graph. Drugs like rapamycin have been known to possess anti-aging capabilities for quite some time now but have still not reached the clinic.

With the usage of machine learning the drug development process can be sped up, and costs can be eliminated. ML can be integrated into many parts of the drug development process here is a list of the key processes.

  • identify novel targets of diseases.
  • provide stronger evidence for target–disease associations
  • improve small-molecule compound design and optimization
  • develop new biomarkers for prognosis, progression and drug efficacy
  • enhance digital pathology imaging

The ones I want to focus on today is target identification, small molecule design also know as de novo drug design, and biomarker development.

De Novo Drug Design

De Novo drug design is the process where drugs are designed just using computers. Before the advent of machine learning de novo drug design was used in some use cases but machine learning, and specifically reinforcement learning has supercharged it.

Reinforcement learning is the subsection of machine learning where an agent in put into a certain environment and learns to maximize a reward by preforming certain actions. This area of ML has been given a lot of attention because the machine learning models figure things out from scratch. Really cool examples of Reinforcement Learning is AlphaGo from Google Deepmind which learned to master the ancient Chinese board game.

Cool example of reinforcement learning
AlphaGo trailer

The company Insilico Medicine is a big company is the de novo drug design area. They published a landmark paper that describes their workflow to design drugs which uses GENTRL (generative tensorial reinforcement learning). Using GENTRL they were able to find inhibitors to a kinase expressed in epitheial cells during fibrosis. The design process of the drug using GENTRL took 46 days while traditional techniques can take upwards of 2–3 years.

GENTRL optomized for 3 parameters when running the model. One was that the drug being designed was novel from other designs than other drugs in a database it is being tested against, another was to make sure the drug was a kinase inhibitor, and lastly isolated drugs that inhibited the kinase they wanted to target — DDR1.

Insilico trained against 6 different datasets to get a variety of training data :

  1. a large set of molecules derived from a ZINC data set (free database of drugs that are commercially available)
  2. Known DDR1 kinase inhibitors
  3. common kinase inhibitors (positive set)
  4. molecules that act on non-kinase targets (negative set meaning thry did not want the drugs designed to be structured like these).
  5. patent data for biologically active molecules that have been claimed by pharmaceutical companies
  6. three-dimensional structures for DDR1 inhibitors

The ability to use machine learning algorithms and compute power will streamline the drug development process. As these algorithms become more sophisticated and more data is published in the literature these systems will only become more robust.

One big unsolved challenge though is how to best represent the chemical structure of molecules in de novo design. Molecules contain a lot of information like polarity, and geometry that cannot be easily expressed in the machine learning algorithms. There have been ways to represent these properties as shown below. Extended-connectivity fingerprints (ECFPs) contain information about topological characteristics of the molecule, and Coulomb matrix encodes information about the nuclear charges of a molecule and their coordinates.

Different types of ways to represent molecules

Biomarker Development

Another one of the promising areas of machine learning to is in biomarker development. Biomarkers are signals that can give positive or negative feedback as to whether a pathology has set in, and the extent to which it has set in. Developing biomarkers using AI will help scientists uncover signals that give accurate measures to different pharmaceutical therapies.

Biomarkers are one of the hot topics in biotech and there are a lot of startups working on different biomarkers for different pathologies. However with the abudance of biomarkers that have been developed only a few have actually been used in clinical trials. A few reasons for this is :

  • low data quality
  • access to data and software
  • reproducibility of the biomarker
  • design of tests suitable for a clinical setting

So biomarkers need to improve in these aspects to actually be used in clinical trials. However with all these challenges, the biomarkers that have been developed have a high efficacy.

Different Epigenetic Clocks

In the aging field there have been a lot of interesting biomarkers to calculate the biological age of a person. The canon biomarker is the Horvath clock and likewise other epigenetic clocks that look at the methylome.

Other clocks that are currently in development range from looking at blood, the proteome, or even non-invasive clocks that look at wrinkles on the skin to give a measure of biological age.

Further challenges for predictive biomarkers include making it easier to interpret the result. Often AI biomarkers act like a black box and it in not very clear why the result is the way it is. That is something that needs to improve. Another big thing is biomarkers that have been tested lightly need to be rigorously tested on a wide variety of datasets to in order to validate their accuracy.

Hey y’all! 👋 I’m Aaron, a 16-year-old who’s super passionate about the intersection between artificial intelligence and human longevity. Feel free to connect with me on Linkedin or check out my full portfolio

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

The Startup
The Startup

Published in The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +772K followers.

Responses (1)

Write a response