Missing heritability found safe and well

The case of the ‘missing heritability’ has become celebrated, by some, as a supposed indicator of just how abjectly the Human Genome Project has failed to live up to its promise. We’ve known for a long time that many human traits and common disorders are quite heritable. The HGP was supposed to reveal the underlying genetic causes, paving the way for deeper understanding and new therapies. But genetics seemed to keep coming up short, finding some causal variants but leaving most of the heritability unexplained, or ‘missing’. A new study (along with a lot of supporting theory and other empirical evidence) shows that the answer lies in genetic variants that are much rarer in the population than those that had typically been studied.


It is common knowledge that many human traits run in families, as does risk of common disorders like heart disease, asthma, or mental illness. People resemble their relatives, not just physically, but physiologically and even psychologically. Twin and family studies have consistently found that most of that resemblance is due to shared genetics rather than shared environment.

If shared genes make people more similar to each other, then, conversely, genetic differences must contribute to differences between people across the population. The heritability is a statistic that estimates how much of a contribution genetic variation makes to the observed phenotypic variation. It can be estimated from twin or family studies and is expressed as a proportion of the total variance.

For height, the heritability is about 0.8, meaning 80% of the variance in height across the population is attributable to genetic variation. For I.Q., most recent estimates of the heritability put it at about 50%. [Note this does NOT mean that 80% of a person’s height and 50% of their I.Q. comes from their genes and the rest from their environment – that is a meaningless statement. Heritability does not apply to individuals nor does it apply to the absolute values of a trait – it only applies to the variance in that trait]. Intuitively, what those heritability figures mean is that if we were all clones, there would be a lot less variability in our height (as we see in identical twins), and only half as much variability in I.Q.

Finding the genes

If a trait is heritable, then there must exist across the population some specific genetic variants that cause the observed differences in phenotype. So, what are these ‘variants’?

The sequence of the human genome – the complete set of genetic instructions for a human being – is basically a single string of DNA, comprising a sequence of three billion chemical bases or ‘letters’, A, C, G, or T, broken into 23 different chromosomes. Starting from the tip of chromosome 1, we can track the sequence at any given position and compare it across people. Most sites show very little difference – almost everyone in the population has the exact same letter at that position. But some sites – a bit over 1 in every 1,000 – have two or more versions, both at an appreciable frequency in the population. So, for example, 35% of the time it might be an “A” and 65% of the time it might be a “C”. Those sites are referred to as common variants or single-nucleotide polymorphisms (SNPs).

These common variants arose through the process of mutation in some very distant ancestor – an error in copying the DNA, leading to a change in sequence. That new variant then spread to the descendants of that individual and, over many, many generations, eventually came to be common in the population. (Of course, many other such variants disappeared from the population over the same time period). Across the whole genome, there are about 5 million such common variants.

In considering the source of the heritability of general traits and common disorders (as opposed to say, very rare genetic diseases known to be caused by specific mutations), a pretty reasonable idea is that common variants will make a major contribution to it. Given the way these traits are inherited (in a continuous, rather than discrete fashion), it also makes sense to expect that each trait will be affected by many such common variants, with combined or cumulative effects.

Genome-wide association studies (GWAS) were designed to try and identify common variants contributing to specific traits or disorders. The basic idea is epidemiological: if a particular version of a SNP increases the value of a trait or increases risk of a disorder, then it should be more common in people with a high value of the trait or with the disorder. (Just as smoking is more common in people with lung cancer than in people without). So, if you compare the frequencies of each of the versions of all the SNPs in the genome across people with different values of a trait, you can find ones that show this frequency difference. These are statistically “associated” with the trait.

In practice, you can look at just a sample of the SNPs across the genome because ones that are near each on a chromosome tend to get inherited together, so if you know which version one of them is, you can infer the other ones, with some measurable degree of certainty. So, most GWAS looked at between 500,000 and a million SNPs. That is still an awful lot of comparisons, which have to be corrected for when you are figuring out which differences are statistically significant and which are just due to some randomness in your sample.

When GWAS first started, the idea was that there might be a dozen, or a few dozen, such variants affecting any given trait. Each of them alone would thus be having a fairly sizable effect, which should be detectable as a big frequency difference, in a modest-sized sample. But early GWAS did not find any such common variants with big effects. Or even with medium-sized effects. In fact, the frequency differences were so small that they did not even reach the threshold for statistical significance in the sample sizes initially used.

This led to the recognition in the human genetics community that very large samples would be required to find any SNPs affecting most traits or common disorders. As a result, huge consortia were formed to pool resources and generate sample sizes large enough for well-powered GWAS, along with very important replication samples. (This is, incidentally, a model now being pursued in neuroimaging and other fields).

The output of GWAS

As sample sizes grew, more and more SNPs were found by GWAS to be statistically associated with many different traits and disorders. In some cases, hundreds or even thousands of associated SNPs have now been found. This has variously been hailed as a monumental success or a huge disappointment. In some ways, how you see it depends on whether you’re a glass half full or a glass ninety-five per cent empty kind of person.

One thing that should be said is that the signals detected clearly seem to be “real”. We can be fairly certain of that because the types of genes impacted by the associated SNPs differ depending on the trait or condition involved, in ways that make biological sense. So, SNPs associated with height affect a lot of skeletal and growth factor gene pathways, those associated with intelligence or schizophrenia affect neurodevelopmental genes, those associated with autoimmune disorders affect immune genes, and so on.

The downside is that so many genes have been associated that the signal does not really zero in on very specific biochemical pathways within those large categories. Indeed, a recent model suggests that genetic variants in pretty much ALL the genes expressed in the relevant tissue may make some contribution to the collective effects of common variants for a given trait.

And that brings up the second disappointment. Even if the effects of any individual associated SNP were too small to be of consequence (or even to offer much purchase for experimental study), there was hope that their collective effects could still be sizeable. But it turned out even these were far from enough to explain the heritability of these phenotypes.

Measuring the combined effects of common variants

There are a number of different ways that one can estimate the combined contribution of common variants to a given phenotype. The first depends on “polygenic scores”. These use a statistical regression method to add up the effects of all the associated variants and determine how much of the variance in the phenotype they collectively explain. For any single variant, its effect size is simply indexed by the frequency difference between cases with a disorder and controls (or between ends of the spectrum of a trait). If you see a really big difference in frequency, you can infer a really big effect (as with smoking and lung cancer).

The effect sizes of single SNPs are tiny – almost negligible, in fact – but as you combine them they start to add up. You can take all of the SNPs that pass the threshold for genome-wide statistical significance and make a combined polygenic score. However, these typically explain only a very small fraction of the heritability of a trait – often in the range of 2-3%.

But you can dig a little deeper, into those SNPs that came close to significance but didn’t quite reach the mark, with the assumption that many of them are also really associated. As you go down the list, the heritability explained increases, until it reaches a point where you’re just adding noise and it starts to get a little worse. The trouble is that the variance explained still remains way off the total heritability, for pretty much all traits and disorders – below 10% in most cases.

There are two possible explanations for that: one is that that is really the limit of the combined effects of common variants on the traits in question. The other is that the method designed to detect their effects in the output of GWAS cannot fully distinguish the signal from the noise, and remaining effects may well exist.

That second possibility is supported by findings from another type of analysis (known as GCTA-GREML or related approaches), which also uses output from GWAS, but in a very different way. The idea is an extension of twin and family studies, which look at how closely phenotypic similarity tracks genetic similarity. The same can be done across samples from the wider population, as we are all distantly related to each other, to varying degrees. The data used for GWAS allow one to measure the degree of genetic relatedness between people in the sample.

If you look across many, many pairs of people, even a small increase in relatedness can be associated with a small increase in phenotypic similarity. In a sample of a thousand people, you can perform a million pairwise comparisons, yielding enough statistical power to confidently detect even these small effects. It is then possible to extrapolate that signal and infer the total heritability tagged by all the common variants in the genome – the ones that can be shared between people who are only distantly related.

Those numbers still fall far short of the total heritability, however, ranging anywhere from 15-50%, depending on the trait. That means a lot of the heritability is still ‘missing’. So where’s it at?

Possible causes of the missing heritability

1.     It was never there. This line of thought argues that the heritability reported in twin and family studies was over-estimated due to things like identical twins being treated more similarly than fraternal twins, or other cryptic environmental confounds. The apparent failure of GWAS to identify causal variants or explain much of the heritability is taken by many as evidence that the traits are not so heritable after all.
2.     MOAR common variants! Maybe we’re not sampling all the SNPs effectively, or we haven’t gone to big enough sample sizes, or the statistical models have some limitations that prevent us from seeing all of the effects of common variants.
3.     Rare variants. As GWAS only sample common variants in the genome, perhaps the remaining heritability is explained by rarer variants.
4.     Structural variants. Changes to single bases of the DNA sequence are not the only type of mutation. Deletions or duplications of whole chunks of chromosomes also arise and can make large contributions to phenotypes, but are not always easy to detect by GWAS.
5.     De novo mutations. Some variants affecting traits and disorders arise de novo, in the sperm or egg that fused to create a new person. These can make a big contribution to the heritability in traditional twin studies (because they are always shared between monozygotic twins, but never between dizygotic) but don’t show up at all in population-based studies. 
6.     Epistatic interactions. The regression models underlying GWAS-based methods of explaining variance assume additive interactions only. If there exists considerable non-additivity in how the effects of genetic variants combine, this could increase the overall effect.

Apart from the first one, there is some evidence that all of these factors are at play. As we will see, the traits and disorders under investigation really are as heritable as twin and family studies indicated, with much of the heritability being contributed by rare variants.

Rare variants

The arguments for an important role for rare variants have always had strong theoretical support from an evolutionary perspective. When a new variant arises through de novo mutation, what happens to it depends on what effect it has and how it is acted on by natural selection.

Most mutations have no effect, partly because protein-coding genes make up only about 3% of the genome, but also because many changes even in the business end of the genome are well tolerated. Of the ones that do have an effect, most of them can be characterised as “bad”. Natural selection has been acting on the human genome for many millions of years and has produced a finely tuned genetic program for making functional human beings. Messing with that program at random is just statistically much more likely to mess it up than to improve it.

The question is: how bad is bad? If it’s just a small effect, then natural selection is not going to care that much about it. The individual carrying the new variant may still thrive and reproduce and the variant may be passed to their offspring, and to their offspring, and so on, and eventually may become common in the population (or not, with the outcome being largely down to chance). But if it’s a big effect, then the individual carrying it may struggle to survive or at least to breed, and may have fewer offspring than others in the population. Such a variant may survive in the population for a few generations but will never become common.

Indeed, really severe mutations may be selected against immediately – people who carry them may never have offspring. Such severely deleterious variants are nevertheless observed in individuals in the population as they repeatedly arise through de novo mutation.

So, if you see a variant in an individual and it is common in the population, it is very likely not having a big effect, if any. If it had, it wouldn’t be common. Conversely, rare variants can have larger phenotypic effects, and the ones with the scope to have the biggest effects are the ones that have arisen de novo in an individual. (Nicely reviewed here).

Assaying the full spectrum of genetic variation

That is where the new study comes in. It is authored by Pierrick Wainschtein, along with Jian Yang, Peter Visscher and colleagues. These researchers obtained whole-genome sequence (WGS) data on 21,620 unrelated individuals of European ancestry. This gave them access to not only the common genetic variation typically assayed in GWAS, but all of the genetic variation, including the rare variants.

When they used just the common variants from this set to estimate the heritability of height and body-mass index (BMI) in this sample, they got results that replicated prior findings: 0.49 for height and 0.27 for BMI – substantial, but well off the twin estimates. But when they put information from all of the genetic variants into their GCTA-GREML model, they got much higher values: 0.79 for height and 0.40 for BMI. These are within the range of the heritability estimated from twin studies (0.7-0.8 for height and 0.4-0.6 for BMI).

Now, there are some potential caveats and technical points to consider, as discussed here for example, by Alexander Young. The issue of possible cryptic population stratification is a particularly vexing one for the field in general these days, with the realisation that the typical method used to correct for it (which relies on principal components analysis) is not sufficient to fully eliminate its possible contribution. (See here and here, for example). 

Despite those caveats, I think the main result will stand: incorporating rare variants from whole genome sequence will allow us to detect practically all of the missing heritability.

The other main finding was that the variants contributing this extra portion of the heritability tended to be very rare and in low “linkage disequilibrium” with neighbouring variants. That is a signature of more recent variants, and suggests that variants affecting these traits have been under negative selection. (This has a technical consequence discussed below).

However, that does not necessarily imply that higher or lower values of the trait have been selected for. We don’t know whether the variants affecting the traits tend to increase or decrease height or BMI, just that they have some effect on it. In fact, we don’t even know that they are being selected against because of their effect on that trait. Most genetic variants are “pleiotropic” – they affect lots of things. So, the negative selection on rare variants affecting height and BMI could be due to their effects on other traits or on general fitness.

Interestingly, a couple papers (here and here) looking at heritability of various traits in populations of yeast have also just come out and come to the same conclusion – much of the heritability is explained by rare variants.


The first major implication is that these traits (and presumably others) are every bit as heritable as twin and family studies have indicated. That is not really much of a surprise when it comes to these physical traits – it fits with common observations that identical twins are really extremely similar in height and tend to be very similar in BMI. But it does suggest that the estimates of heritability for other, less overt traits – such as intelligence, or personality constructs, or risk for psychiatric disease – are also reasonably reliable. (Noting that heritability is not a fixed, universal, biological constant but one that applies to specific populations at specific times).

In addition, the fact that much of the heritability lies in rare variants has important implications for the predictive value of polygenic scores. If these scores are derived from profiles of common variants alone, as they typically are (for example in direct-to-consumer genomics), then they will only capture a fraction of the genetic variance, and therefore will be much less predictive than they could be.

One hope might be that if we fully sequence a decent-sized reference sample of people from any given population (in the tens of thousands), and we therefore see all the rare variants that they have and how they are linked to various local patterns of common variants, then we might be able to infer or impute what rare variants other people have merely by analysing their own patterns of common variants (which is MUCH cheaper than actually fully sequencing everyone).

Wainschtein and colleagues tried that approach in their sample and found that it does indeed allow them to access a bit more of the genetic variance, but not all of it. To get all of it they had to actually fully sequence the genomes of all the people. The reason goes back to that observation that the rare variants that were making the extra contribution were not tightly linked to specific patterns of common variants. So there was no way to impute their presence from a given pattern of common variants. Essentially, the ones making the biggest contribution were so rare and so recent that many weren’t even in the reference sample. 

Epistasis – when things don’t just add up

The typical model used in the generation of polygenic scores assumes that all of the individual genetic effects combine linearly. That is, the effect of any given variant on a trait as measured across the whole population (say, +0.1cm of height) is the actual effect in every individual who carries it, regardless of the rest of their genetic make-up. Now, that could be true, but we know that non-linear (non-additive, or “epistatic”) genetic interactions are actually the norm in biology, not some kind of weird exception.

That said, if a given trait is affected by thousands of genetic variants, then all those non-linear interactions may actually average out, and you will be left with a sum of effects that does actually look additive, statistically speaking. Really, you can model it either way and the data will fit pretty well.

Conversely, however, the fewer variants involved, and the larger their individual effects, the greater the opportunity for pairwise or higher-order epistatic interactions to have a big effect and cause a deviation from additivity in individuals.

This brings us to a crucial point – the distinction between explaining variance in a trait across the population and predicting individual values in that trait. These are not the same thing at all. You can in fact have quite good explanation of the variance – even complete access to all the heritability, as in the Wainschtein study – and still not be able to predict the genetic effects in individuals very precisely, especially when epistatic interactions are at play.

Unique and unpredictable

In general then, our individual genetic heritage comprises a large background of ancient, common variants, which make an important collective contribution to many traits, and a much more unique profile of rare variants, which have larger individual effects and which are also likely to show more non-linear genetic interactions.

Even if we capture most of the genetic variance affecting a trait across the population, it will remain tricky to make exact predictions of individual phenotypes from genetic information, because those unique profiles will never have been seen before in reference samples.

And finally, it should be noted that even if we do have access to all the rare variants and even if we do know all the epistatic effects, most traits are not completely heritable. Because much of the rest of the variation may be due to randomness in development, there will always be a strong limit – in principle, not just in practice – to how predictive polygenic scores can ever be. Which personally makes me feel – admittedly rather perversely – a bit better. 


Popular posts from this blog

Undetermined - a response to Robert Sapolsky. Part 1 - a tale of two neuroscientists

Grandma’s trauma – a critical appraisal of the evidence for transgenerational epigenetic inheritance in humans

Undetermined - a response to Robert Sapolsky. Part 2 - assessing the scientific evidence