Examining the Horvath Clock and the Statistical Method Behind it

How to find Biological Age from DNA Methylation Data

Aaron Lewis
TDS Archive
Published in
7 min readMar 12, 2020

Source: Unsplash Edited by myself

Imagine you are a researcher working in a lab to create an anti-aging pill. After months of researching different anti-aging supplements that historical groups have used you find a suitable intervention. The question now becomes how do you prove the intervention actually inhibits aging?

In clinical trials, you can’t wait for a patient's full lifespan to validate wherever the anti-aging supplement actually worked. That would be time-inefficient.

Instead, we need a metric that can measure evaluate the effectiveness of different interventions in a short time-frame. This is the main reason for the demand for biomarkers of aging.

Before we go into the different biomarkers of aging it is important to define a few concepts. The first is, what is aging?

The dictionary definition of aging is the process of growing old. Simple enough right?

Well, researchers often split aging into two distinct categories: chronological age, and biological age. Chronological age is aging in the terminology that is commonly understood; how old you are. The amount of time that has passed from your birth to the current date.

Biological age is a new concept. Biological age is the way in which your cells have changed over time and can be influenced by many different lifestyle factors. To give you an intuitive example of this, let’s take a pair of twins one who smokes and one who doesn’t. The biological age of the smoker would be higher than the non-smoker. Because he made a bad lifestyle choice that accelerated his aging. The chronological age of the twins would be the same.

people of the same chronological age can have different biological ages due to lifestyle differences. Source: Unsplash edited by myself

By measuring your biological age in proportion to your chronological age it can give you a good idea of the rate of which you are aging. If one of your tissues is chronologically aged 50 but biologically 30 that would be good. If you are chronologically 50 but biologically 70, that would be bad.

The key to a lot of research showcased in this article was to determine a biomarker that could showcase a biological age. There were several possible candidates :

  1. telomere length (caps at the ends of chromosomes)
  2. gene expression levels
  3. protein expression levels

But the most accurate, and widely used biomarker is called DNA methylation. Researchers have found that DNA methylation is the only biomarker that meets criteria to be considered a marker for biological age.

So what are some of the criteria?

  1. stong correlation with chronological age
  2. predicts age-related phenotypes
  3. responsive to different interventions/lifestyle changes

The methylation clock is shown to meet these criteria which make it a very good candidate, and by far the most highly studied biomarker of aging.

DNA Methylation

DNA methylation is part of the greater study known as epigenetics. Epigenetics is the collection of different biological processes that regulate the genome. It is what causes cellular differentiation, or what makes your nerve cell different from your muscle cell.

CH3 (methyl group) attaches to cytosine in a CpG site Source: Wikimedia Commons

DNA methylation changes the expression by the addition of a methyl group ( CH3) to a CpG site. A CpG site is a place in the genome where a cytosine nucleotide is followed by a guanine nucleotide. The methyl group can bind to the cytosine nucleotide to create 5-methylcytosine. Places in the genome with high frequencies of CpG sites are known as CpG Islands. There are approximately 28 million CpG sites on the human genome, and 27 thousand CpG islands.’

5-methylcytosine has an additional Methyl group. Source: ResearchGate

CpG Islands commonly located where there are promoters or the region in the DNA where transcription is begun. About 70% of human promoters have high CpG content. It is speculated that methylation of CpG sites in the promoter of a gene may inhibit gene expression.

As we age our methylation patterns changes. When our cells divide “epigenetic noise” is added and changes the methylation patterns. The idea is to find CpGs that change in predictable ways in order to develop the biological age estimate.

Developing Predictors of Biological Age

Now how do we take information from DNA methylation and translate it to a functional aging biomarker? This is where the work Dr. Steve Horvath from UCLA comes to play. Horvath is accredited with the creation of one of the most used methylation clocks called the “Horvath Clock”. The Horvath Clock gives a measurement of biological age that can be used in a plethora of cell types. I’ll outline the process in which data from methylation is translated to an age.

Horvath assembled the methylation datasets measured from Illumina Methylation Assays 27k and 450k. The 27k looked at approx. 27,000 CpG sites, while the 450k looked at 450,000 CpG sites, hence the names.

This screenshot shows how the data looks when inputted. Source screenshot of my computer

Methylation is analog. It is either on the CpG site or it is not. However, the way the assay works is it looks at the methylation of tens of thousands of cells in a given tissue and then finds the proportion of those cells that are methylated. That is why, when inputted into his statistical model for the clock, the CpG sites are a value in between 0 and 1. 1 being methylated in all cells and 0 being methylated in none of the cells.

Skipping the data normalization steps, to find the CpG sites out of the ~21,000 sites that would be inputted Horvath used an elastic net regression model trained against chronological age. This is a statistical model that uses linear regression but with a regularization technique that eliminates and shrinks a lot of the parameters. Check out this video series to learn more.

The model has 21,000 parameters and a lot of them are not significant to creating the clock. By using this elastic net regression it allows you to get rid of a lot of the useless parameters which Horvath narrowed down to 353.

Correlation between Horvath clock Age and real age Source: Wikimedia Commons

The model was trained against chronological age because one of the most important criteria was that it had a strong correlation with chronological age because, after all, chronological age is a good proxy for biological age. The Horvath clock age actually has a Pearson correlation Coefficient of 0.98 to chronological age which is unprecedented.

Horvath trained and tested his model on 82 datasets and many different types of tissues and cell types. The end result is a formula that takes in the 353 CpG sites as input and has weights (coefficients) that show a positive or negative relationship to the biological age.

In the process of creating this article, I followed a tutorial that was laid out by Dr. Horvath which took one of the datasets and showed how the model was coded in R. It was a pretty cool tutorial and the GitHub is linked here.

Next Steps

The Horvath clock was created in 2013. Since then there has been so much advancement in the epigenetic clock space with the development of the PhenoAge clock by Dr. Morgan Levine at Yale and the GrimAge clock created by Ake Lu from UCLA.

New epigenetic clocks are interested in predicting specialized disease-specific clocks. Creating a clock that can predict things like levels of cellular senescence. There are a lot of challenges and problems to tackle as researchers on the cutting-edge continue to make breakthroughs.

This article was to help me grasp a first-principles understanding of epigenetic clocks which have fascinated me for a while now. There are still many questions I have about the epigenetic clock and biological age, most profoundly what is the biological mechanism that is behind the changes in methylation? It is still unclear to me how that works although I have read that it is due to an “epigenetic maintenance system”. Possibly having something to do with the Information Theory of Aging spread of David Sinclairs whose book I am currently reading — Lifespan.

Until next time.

Hey y’all! 👋 I’m Aaron, a 15-year-old who’s super passionate about the intersection between artificial intelligence and human longevity. Feel free to connect with me on Linkedin or check out my full portfolio

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Responses (1)

Write a response