The genetics of educational attainment
A recently announced paper reports the results of an enormous genome-wide association study for educational attainment. The authors found 74 regions of the genome where there are common variants that show statistically significant association with this trait. Here are my thoughts on what this study found, what it didn’t find and what those positive and negative results might mean.
First, it was a huge effort by a lot of people who should be congratulated for working together to carry out this analysis on such a huge scale. It is an interesting question and a worthwhile effort, in my view. The trait they measure, time spent in education, is an important one and has been shown to be moderately heritable. One large study estimated the heritability at ~40%, meaning of the variance in this trait, in the sample studied, around that much was found to be attributable to genetic differences between people. (For reasons I can’t figure out, the current study cites that paper, but gives a figure of “at least 20%” for the heritability). There is also strong evidence that “Educational attainment is moderately correlated with other heritable characteristics, including cognitive function and personality traits related to persistence and self-discipline.” Understanding the genetics of these traits is highly interesting, if for no other reason than that it can help us understand some of the major differences in human experience.
The authors of the current study have clearly found what look like some real associations between common genetic variants and educational attainment. First, they replicate quite well across their different samples. Second, the variants are in a highly non-random set of genes – they are enriched for genes expressed in the brain during fetal development and for genes that encode proteins involved in neurodevelopment processes like neuronal differentiation, cell migration and axonal guidance – all the processes that are involved in putting the brain together! So, we can conclude that differences in how the brain develops can have some effect on intelligence or other traits (like drive) that contribute to variation in educational attainment. Of course, that doesn't sound surprising really when you say it like that – no more than the finding that common variants in skeletal growth genes influence height. But it didn't have to turn out that way.
What is more interesting to me is what they did not find. The 74 variants they find have tiny individual effects on the trait (even by GWAS standards) and collectively explain only 3% of the genetic variance in the trait. Their study was certainly well enough powered to detect common variants with even vary small effect sizes – the fact that they did not find any more of them is therefore strong evidence that they do not exist.
So, instead of focusing on the variants they did find, one might instead ask what is contributing the other 97% of the genetic variance in this trait?
There are a few possible explanations:
1. There may exist many, many more common DNA variants that contribute to variation in this trait across the population, but if this is true, each of these must have an even smaller effect than the SNPs they have already found (approaching negligible, in fact). It would take enormous samples to find more of them and, given the diminishing returns in terms of effect sizes, they would likely explain only a very small additional percentage of the variance, even if many of them are found.
(Note that methods like Genomic Complex Trait Analysis have been used to try and estimate how much of the variance in educational attainment is tagged by common variants across the whole population, even though we may not yet be able to identify them. Davies et al estimated this at 21%).
For what it’s worth, I am generally skeptical that this method can produce precise estimates, given the tiny signals it relies on. I am even more skeptical of the interpretation of such results, which is based on the assumption that because common variants can be used to index genetic relatedness across a sample, that any association between this index and phenotypic relatedness must be caused by the actions of those common variants and that it can indicate how many common variants must be involved. In fact this method tells us nothing about the number or allelic frequency of the causal variants involved.)
2. Their statistical methodology may have missed common variants that have effects only in specific combinations (rather than their individual effects simply being summed). This is a general potential problem with GWAS methodology. However, with a sample size like the one they have, even variants that do have such epistatic interactions would still be likely to show some non-zero individual effect on average. See here for more on "what GWAS signals mean".
3. The most likely explanation to me is that the genetic variants that make by far the biggest contribution to this trait are not common across the population, but rare. It would, in fact, be astonishing if rare mutations did not make an important contribution to the traits underlying educational achievement, especially intelligence. Generally speaking, rare mutations have bigger effects than common ones, we all carry many rare mutations, and intelligence is exactly the kind of trait that may be affected by many of them, either individually or in aggregate.
This has profound implications for how we think about the genetics of intelligence. What it means is that maybe there are no genes "for intelligence" - that is, genetic differences that explain most of the variance in intelligence across the whole population. Instead, our intelligence may be affected much more by the unique profile of rare mutations that we each carry.
These could be mutations in genes like those found in this GWAS study – ones that directly control processes of neural development or other aspects of how the brain functions. But there also could be a more general effect of overall mutational load, which might reduce the robustness of the processes of neural development. Under this model, intelligence may be not so much a specific trait, reflecting some particular brain processes, but rather a general fitness indicator. We may think of intelligence like we think of “performance” of a car or an aircraft – as relying not just on specific components but also, maybe even more so, on how they are all put together.
Note also that nonlinear epistatic interactions are highly likely between the rare variants we each carry – see here for more on that and why it places serious limits on how much we will ever be able to predict intelligence.
So, overall, I think one of the strongest conclusions from this study is the one they do not draw – that most of the genetic variation in this trait (the unaccounted for 97%!) is probably NOT due to common variation but most likely to the profile of rare mutations that we each carry.
Finally, given how easily and widely this kind of stuff is misinterpreted, and how readily people ascribe viewpoints to people discussing it that they do not actually hold, it may be wise to issue a few disclaimers:
- saying a trait is partly heritable is not the same as implying it is entirely genetically determined
- there are clearly also important sociocultural factors affecting educational attainment
- there may also be important interactions between genetics and sociocultural factors
- showing some genetic influences on a complex trait is not the same as "reducing it" to the actions of a few genes (we are interested here in how variation in genes leads to variation in the trait; not how a trait like human intelligence comes about in the first place)
- the findings described here will be of no practical use in screening people or predicting their academic success