Thursday, May 12, 2016

The genetics of educational attainment

A recently announced paper reports the results of an enormous genome-wide association study for educational attainment. The authors found 74 regions of the genome where there are common variants that show statistically significant association with this trait. Here are my thoughts on what this study found, what it didn’t find and what those positive and negative results might mean.,d.ZGg&psig=AFQjCNF9QpiHhOG6TGJel8AsnuxQ0Joq4w&ust=1463133476246679&cad=rjt

First, it was a huge effort by a lot of people who should be congratulated for working together to carry out this analysis on such a huge scale. It is an interesting question and a worthwhile effort, in my view. The trait they measure, time spent in education, is an important one and has been shown to be moderately heritable. One large study estimated the heritability at ~40%, meaning of the variance in this trait, in the sample studied, around that much was found to be attributable to genetic differences between people. (For reasons I can’t figure out, the current study cites that paper, but gives a figure of “at least 20%” for the heritability). There is also strong evidence that “Educational attainment is moderately correlated with other heritable characteristics, including cognitive function and personality traits related to persistence and self-discipline. Understanding the genetics of these traits is highly interesting, if for no other reason than that it can help us understand some of the major differences in human experience.

The authors of the current study have clearly found what look like some real associations between common genetic variants and educational attainment. First, they replicate quite well across their different samples. Second, the variants are in a highly non-random set of genes – they are enriched for genes expressed in the brain during fetal development and for genes that encode proteins involved in neurodevelopment processes like neuronal differentiation, cell migration and axonal guidance – all the processes that are involved in putting the brain together! So, we can conclude that differences in how the brain develops can have some effect on intelligence or other traits (like drive) that contribute to variation in educational attainment. Of course, that doesn't sound surprising really when you say it like that – no more than the finding that common variants in skeletal growth genes influence height. But it didn't have to turn out that way. 

What is more interesting to me is what they did not find. The 74 variants they find have tiny individual effects on the trait (even by GWAS standards) and collectively explain only 3% of the genetic variance in the trait. Their study was certainly well enough powered to detect common variants with even vary small effect sizes – the fact that they did not find any more of them is therefore strong evidence that they do not exist.

So, instead of focusing on the variants they did find, one might instead ask what is contributing the other 97% of the genetic variance in this trait? 

There are a few possible explanations:

1. There may exist many, many more common DNA variants that contribute to variation in this trait across the population, but if this is true, each of these must have an even smaller effect than the SNPs they have already found (approaching negligible, in fact). It would take enormous samples to find more of them and, given the diminishing returns in terms of effect sizes, they would likely explain only a very small additional percentage of the variance, even if many of them are found.  
(Note that methods like Genomic Complex Trait Analysis have been used to try and estimate how much of the variance in educational attainment is tagged by common variants across the whole population, even though we may not yet be able to identify them. Davies et al estimated this at 21%).  

For what it’s worth, I am generally skeptical that this method can produce precise estimates, given the tiny signals it relies on. I am even more skeptical of the interpretation of such results, which is based on the assumption that because common variants can be used to index genetic relatedness across a sample, that any association between this index and phenotypic relatedness must be caused by the actions of those common variants and that it can indicate how many common variants must be involved. In fact this method tells us nothing about the number or allelic frequency of the causal variants involved.)

2. Their statistical methodology may have missed common variants that have effects only in specific combinations (rather than their individual effects simply being summed). This is a general potential problem with GWAS methodology. However, with a sample size like the one they have, even variants that do have such epistatic interactions would still be likely to show some non-zero individual effect on average. See here for more on "what GWAS signals mean". 

3. The most likely explanation to me is that the genetic variants that make by far the biggest contribution to this trait are not common across the population, but rare. It would, in fact, be astonishing if rare mutations did not make an important contribution to the traits underlying educational achievement, especially intelligence. Generally speaking, rare mutations have bigger effects than common ones, we all carry many rare mutations, and intelligence is exactly the kind of trait that may be affected by many of them, either individually or in aggregate.

This has profound implications for how we think about the genetics of intelligence. What it means is that maybe there are no genes "for intelligence" - that is, genetic differences that explain most of the variance in intelligence across the whole population. Instead, our intelligence may be affected much more by the unique profile of rare mutations that we each carry. 

These could be mutations in genes like those found in this GWAS study – ones that directly control processes of neural development or other aspects of how the brain functions. But there also could be a more general effect of overall mutational load, which might reduce the robustness of the processes of neural development. Under this model, intelligence may be not so much a specific trait, reflecting some particular brain processes, but rather a general fitness indicator. We may think of intelligence like we think of “performance” of a car or an aircraft – as relying not just on specific components but also, maybe even more so, on how they are all put together. 

Note also that nonlinear epistatic interactions are highly likely between the rare variants we each carry – see here for more on that and why it places serious limits on how much we will ever be able to predict intelligence.

So, overall, I think one of the strongest conclusions from this study is the one they do not draw – that most of the genetic variation in this trait (the unaccounted for 97%!) is probably NOT due to common variation but most likely to the profile of rare mutations that we each carry.

Finally, given how easily and widely this kind of stuff is misinterpreted, and how readily people ascribe viewpoints to people discussing it that they do not actually hold, it may be wise to issue a few disclaimers:

- saying a trait is partly heritable is not the same as implying it is entirely genetically determined
- there are clearly also important sociocultural factors affecting educational attainment
- there may also be important interactions between genetics and sociocultural factors
- showing some genetic influences on a complex trait is not the same as "reducing it" to the actions of a few genes (we are interested here in how variation in genes leads to variation in the trait; not how a trait like human intelligence comes about in the first place)
- the findings described here will be of no practical use in screening people or predicting their academic success


  1. Of course, they are finding the things they are looking for: individual genes. Se are sill to develop metodologías for cooperativa effectsbthat are non additive. But we are not very efficient understanding systems made of large numbers of interacting entities even if the individual interacting are simple themselves, this is why we need computers, then we can do what we can do best: find patterns un the results.

  2. This is a great article (not misrepresenting the results to support your prior beliefs). You say that you believe rare mutations explain most of the heritability of intelligence rather than common genes. What studies support this idea (other than the current study only explaining ~3% of the variance)?

    1. Thanks for your comment and your question. Here are some of the reasons why I think rare variants MUST be involved in intelligence, (or at least why that should be a default hypothesis):

      1. Rare variants in hundreds of known genes cause intellectual disability
      2. It stands to reason that less severe variants in the same genes (or others) may cause smaller decrements in intelligence – after all, this is exactly what the common variants model proposes
      3. Rare mutations are generally known to have larger phenotypic effects than common variants
      4. We all carry hundreds of rare mutations affecting proteins
      5. It is both statistically unlikely and biologically implausible that none of these should affect intelligence, given it is such a high-level property of the nervous system and thus dependent on so many lower-level components
      6. There is thus a strong a priori expectation that rare mutations should contribute to this trait in individuals and to variance in the trait across the population (just different ones in different people)
      7. GWAS are not just exploratory but also a test of the over-arching hypothesis that common variation explains phenotypic variance
      8. Negative results from well-powered studies should not therefore lead to default interpretation that trait must be explained by many more common variants of even smaller effect!
      9. Given the a priori expectation of involvement of rare variants, the hypothesis that they explain the missing heritability is a more parsimonious default position

    2. In Rietveld et al. 2013 it was estimated that common variants explain ~20% of phenotypic variability (section 2b in the supplement). So that would cap every other genetic effect to a maximum of ~20% (as the total heritability is estimated at ~40%). Rare variants do play a role in educational attainment (unpublished) but it is a smaller or comparable role when compared to common variants.

  3. Great summary of the work (as usual)!
    However, I think you may have missed a fourth explanation which appears to be overlooked by most authors in this field: feedback amplifying small genetic differences and inflating heritability estimates. This (admittedly old) paper explains the phenomenon quite nicely:

    Would be interesting to know if their model has received any criticism?

    1. Thanks, I will read that with interest. In general, there is certainly likely to be a strong cultural amplification of initial genetic differences, rather than a leveling out of them. This happens as people choose their own environments or get selected for different treatment based on their talents and aptitudes. That's very obvious for athletic or musical talent (the most talented get the most training) but is also almost inevitably involved in education.

  4. If GWAS is not the way forward for finding the genetic background of major cognitive traits, then what is? Theory indicates that it should mostly be common variants with additive effects, and if anything it provided a clear way forward.

    What else can be done?

    1. Well, if the the traits involved are caused by profiles of rare mutations, then finding these may prove very difficult. It would require very large sample sizes of whole-exome or whole-genome sequencing. But if the mutations involved are very rare and the profiles essentially unique, it may never be possible to find all the variants explaining the heritability. If individual variants are not identifiable, it might still be possible to estimate or model the genetic load in aggregate and see how that correlates with educational outcomes or IQ. Of course, you'd have to ask - why would you want to do that? Just to understand the genetic architecture of the trait (a perfectly worthwhile goal) or for some practical purpose (as in predicting people's intelligence or pre-screening for it somehow, which seems both much more difficult and ethically dubious