Life after GWAS – where to next, for psychiatric genetics?

GWAS (genome-wide association studies) for psychiatric illnesses may be about to become a victim of their own success. The idea behind these studies is that common genetic variation – ancient mutations that segregate in the population – may partly underlie the high heritability of common psychiatric and neurological disorders, such as schizophrenia, autism, epilepsy, ADHD, depression, and so on. The accumulating evidence from over ten years of GWAS strongly supports that idea, with many hundreds of such risk variants now having been identified. The problem is it’s not at all clear what to do with that information.

GWAS are a method to carry out a kind of genetic epidemiology, based on a simple premise – if a particular genetic variant at some position in the genome (say an “A” base, as opposed to a “T” at position 236,456 on chromosome 9) – is associated with an increased risk of some condition, then the frequency of the “A” version should be higher in people with the condition than people without. (Just as the frequency of smoking is higher in people with lung cancer than without).

If you examine the frequency of those kinds of variants across the whole genome, in a large enough sample of people, you can exhaustively search for any that confer risk to the condition, above a certain effect size. The larger the sample, the smaller the effect on risk that can be statistically detected. Initial GWAS for conditions like schizophrenia came up empty. With samples in the low thousands, all that could be concluded from such studies was that there were no common variants in the genome that contributed even a moderate increase in risk, individually. (This isn’t surprising, given the expectation that risk variants should be negatively selected against). 

But as samples sizes have grown, into many tens of thousands, common variants have begun to be identified that are reliably statistically associated with risk of the condition (i.e., that are slightly more frequent among cases than among controls). There are now well over a hundred specific variants that reach the stringent threshold for genome-wide statistical significance for schizophrenia (here and here) and for depression. Each of these is associated with only a very small statistical increase in risk for the condition (usually on the order of 1.05-fold). This is almost negligible, but not quite.

In addition, below that significance threshold, there are thousands more that seem likely to convey some of the genetic risk for the condition, even if we can’t yet say so with statistical confidence. Indeed, it is possible to estimate the collective effect on risk that all such common variants in the genome might convey and this is typically sizable. By calculating “polygenic scores”, which take all of these putative associations into account, one can determine where any given individual lies on an idealised continuum of risk. For schizophrenia, these scores can explain about 5% of the variance in disease status across the population in new test samples.

Polygenic scores

Polygenic scores may not currently explain much of the variance in these conditions but may nonetheless be useful for lots of things. One is for examining the shared genetics of different traits or conditions. For example, polygenic scores for intelligence are negatively correlated with risk of a number of psychiatric conditions, including schizophrenia and ADHD. That’s an interesting finding, that fits with the idea that intelligence may be partly an indicator of general neurodevelopmental robustness.
Polygenic scores may also be useful in experiments aimed at identifying possible environmental or experiential factors contributing to mental illness, by allowing researchers to control for underlying genetic risk, which could otherwise obscure or confound other effects.

Where they will probably not be very useful is in identifying people at higher than average risk of disease. Well, let me rephrase that – they may be useful at doing that, statistically speaking, but that information may not be actionable, not for psychiatric conditions at least.

This is for several reasons: first, the scores will only ever capture a small portion of the genetic risk of the condition. This is because: (i) un-captured rare mutations also make an important contribution to risk for many psychiatric disorders; and (ii) the specific combinations of risk variants present in any individual may be more important than just the additive burden. These effects will never be captured by signals that have been averaged across the whole population. 

Second, genetics only confers a portion of the overall variance in risk – the heritability of psychiatric disorders ranges from 20-70%, meaning a sizable fraction of the variance in risk is non-genetic in origin. There is a tendency to think if it’s not genetic, it must be environmental, but this is a mistake – much of this non-genetic variation may be due to the inherent randomness of the cellular processes of brain development.

For these reasons, the predictive precision of polygenic scores will always be low for individuals – this is a limit in principle, not just in practice. A statistical prediction might be useful enough to be acted on by insurance companies, or even in pre-implantation genetic screening if people are inclined to use it, but won’t carry much actionable information for doctors treating individual patients. In any case, for most such conditions, there are few preventive measures that can be taken, beyond generally looking after oneself – moderating alcohol use, staying away from hard drugs, avoiding smoking, reducing stress, better exercise, diet, and sleep habits. People don’t get prescribed antidepressants or antipsychotics prophylactically, for good reasons.  

So, for psychiatric conditions, polygenic scores will likely be useful for some kinds of research, but less so for clinical purposes.

What about the biology?

One thing that GWAS for psychiatric disorders have not been that useful for is elucidating the underlying biology – or at least not in the way it was hoped. One of the driving motivations of GWAS was the idea that associated variants would implicate specific genes or biochemical pathways in the pathogenesis of psychiatric disease, possibly even providing direct molecular targets for new therapeutics. This has turned out not to be the case.

It’s not that GWAS haven’t identified anything – just the opposite – they’ve identified too much. If we take schizophrenia as an example, GWAS have highlighted hundreds of variants that implicate nearby genes, encoding proteins with very diverse functions.

Among that list of genes, there is an enrichment for ones with functions in various neurodevelopmental processes or functions at neuronal synapses, for genes expressed in the brain, especially the fetal brain, for genes with greater expression in broad brain areas like the cerebral cortex, or for genes with greater expression in very broad classes of cell types, like glutamatergic or GABAergic neurons.

Collectively, these data suggest that something about some neurons in some parts of the developing brain goes awry in schizophrenia, somehow.

Now, it’s important to note that that’s not nothing. At the very least, it provides strong evidence that the GWAS signals are real – they didn’t land on genes for liver enzymes or connective tissue proteins or eye colour pigments. But these findings don’t tell us much more, biologically, than we knew before – that schizophrenia is a disorder of disturbed neural development.

Even worse, a new model of “omnigenic” inheritance suggests that every gene that is expressed in the relevant tissue for a given disease (the brain in this case) will harbor some genetic variants that are statistically associated with disease risk. Larger and larger GWAS will no doubt uncover more and more of these statistical associations, though the effect sizes will get smaller and smaller. Beyond a certain stage, it seems reasonable to ask: “What’s the point?”

It seems clear now that GWAS will not, by themselves, implicate very specific pathways or directly yield new mechanistic insights into the nature of the associated conditions. This is especially true for psychiatric conditions, because, while they may have genetic origins, they do not have proximal genetic mechanisms. Unlike a condition like cancer, which really reflects an altered state of gene expression at a cellular level, psychiatric conditions like schizophrenia do not. They reflect altered states of distributed brain circuits and systems. Genetic variation may cause such a state to emerge, but there may be nothing in the molecular functions of the affected genes that specifically relates to that state in any acute or on-going fashion.

For cancer, if discovering risk genes is part A, and B is the resultant phenotype, you can go from A to B in one step. For psychiatric disorders, you need to go from A to Z, where the phenotype, Z, is not directly resultant, but very indirectly emergent, through dynamic interactions between hundreds of different cell types across distributed circuits over long periods of time, as development and maturation play out.

It will therefore take experimentally tractable model systems to work out the chain of steps from A to Z. Characterising the properties of the genes themselves (by cross-referencing with other large-scale “omics” datasets, for example) just won’t do it. And the problem with the variants implicated by GWAS is that they simply don’t provide an experimental handle for those kinds of follow-up experiments.

From analysis to experiment

In the first instance, the genetic variant assayed in GWAS is just a marker – the actual functional variant will tend to be co-inherited with that marker but it may take some effort to find it. And then one has to figure out which gene it is affecting and how. Sometimes it’s the nearest gene, but other times the affected gene may be some distance away. And most often the effect will be just a small change in expression level, possibly only in some cell type or other. It is possible that you could find some cellular process where that small change in expression makes an obvious difference in an experimental assay. But you could spend a long time looking for what that process might be, for any given gene. 

Moreover, even if you did find some cellular process that was affected by a modest expression change in Gene X, you’d still be many, many steps away from understanding how a change in that process could ultimately contribute (a tiny amount) to an increase in risk for a condition like schizophrenia. And there seems no way to bridge that gap. The effect on risk of any given variant alone is simply too small to be elucidated in this way.

For example, one of the variants associated with risk of schizophrenia tags a gene encoding the C4 complement protein. This protein is involved in the immune system, but also, as it turns out, in pruning surplus synapses in the developing nervous system. An impressive paper tracked down the causal functional variants in the gene, which turn out to relate to how many copies of the protein-coding sequence are present. The differences in copy number are correlated with overall expression level of the protein in the brain.  

But what effect does this have? The authors turned to mice as a model system to investigate this – kind of. They showed that complete removal of both copies of the gene affects the pruning of synapses in a specific part of the developing mouse brain. Now, that’s very nice and all (really beautiful work in fact), but it’s actually just telling us what the normal function of the encoded protein is (by completely removing it), not what the effect might be of a modest increase in expression, which was what the functional risk variants were associated with.

The problem is that the analyses in the mouse are completely incommensurate with the effect sizes in humans – there’s just no useful way to relate them. It’s not like we can really infer that excessive synaptic pruning underlies schizophrenia – not when this is just an arbitrarily chosen one of hundreds of genes harboring variants that collectively contribute only a modest proportion of the overall risk of the condition. If there were massive convergence onto this biochemical pathway or cellular process that would be one thing, but there is not.

We could follow the same approach for hundreds of other common risk variants and be none the wiser as to the biology of schizophrenia. We’d end up with lots of inferences and no way to test them.

But if we can’t make much headway investigating these common variants individually, maybe we can figure out what their collective effects are. Well, maybe, but to move beyond correlative analyses in human subjects we would need to recapitulate these high-risk polygenic profiles in some experimental system. That won’t be possible in animals. Perhaps the only way to do it is to generate induced pluripotent stem (iPS) cells from people with high versus low polygenic risk scores.

These iPS cells can then be turned into neurons, or even “minibrains” (cerebral organoids) in a dish and the effects of the polygenic burden can be assayed. This could be a powerful method, in theory. In practice, there are a few problems with its application to psychiatric disorders.

First, what phenotype do you look at? For a condition like microcephaly, minibrains are perfect as they recapitulate early stages of brain development really well, where processes of neuronal proliferation can be directly modelled. For psychiatric conditions, the defect is more likely in subtle aspects of connectivity, including activity-dependent refinement over months or years. You might get at early stages of synapse formation in a minibrain, which could prove very interesting, but the emergence of the relevant brain-wide pathophysiological states will prove much harder to model in a dish. If you want to understand the full chain of events leading to brain dysfunction, you’re going to need a real brain, in a behaving animal.

Second, the actual variants contributing to a high-risk polygenic profile will differ between any two people. They will be an overlapping but unique subset of all the risk variants in the population. Who knows if the effects will converge onto any particular cellular process? They might, but they might just as well not. It seems likely that insults to many different primary processes might lead to convergence on a particular pathophysiological state (as is the case with epilepsy, for example).

Third, we all carry hundreds of rare mutations, in addition to our polygenic burden of common variants. These are especially important for neuropsychiatric conditions. Differences in relevant phenotypes between cells from any two individuals (especially if one has a disorder and the other does not) could just as well – indeed, more likely – be due to those unknown rare variants than to the polygenic profile.

Polygenic burden degrades general robustness

The best way to think of polygenic burden may be not so much as affecting any particular process so much as reducing the robustness of all processes. The genome has evolved to buffer many insults – environmental variation, molecular noise, and genetic variation. This robustness allows genetic variation to accumulate in the population if the individual mutations are not too severe. However, these accumulating variants collectively degrade the robustness of the system, by compromising the evolved interactions of all the components. The higher a person’s polygenic burden of risk variants, the lower their ability to buffer additional insults.

Under this model, the polygenic profile may not cause disease by itself, even at the highest end of the distribution of common risk variant burden. Instead, it may make it harder to buffer the effects of rare mutations, acting as a strong genetic modifier to determine whether a disorder results or not.

There is a variety of evidence to support this view. First, polygenic risk for various psychiatric disorders is highly overlapping, consistent with a general vulnerability, not related to the specific symptoms of any diagnostic category. Second, as mentioned above, the polygenic signal for intelligence is negatively correlated with risk of schizophrenia, ADHD, and several other psychiatric conditions (and vice versa). This is likely not because being intelligent is protective, per se, but rather because intelligence is an index of neurodevelopmental robustness. Finally, recent studies have shown that the background of common variants acts as a modifier of the severity of the effects of rare mutations associated with very specific Mendelian conditions. 

This view is equivalent to the well known genetic background effects commonly seen for many kinds of phenotypes (especially behavioral ones) when mutations are crossed into different strains or lines of mice or flies. These effects can be quite sizable – sometimes causing very different outcomes in different strains. However, they are extremely hard to study, and it isn’t obvious that you would really learn much from working out their underlying mechanisms in detail.

Follow that gene!

Indeed, the lesson from model organisms is that if you want to understand biology, you should follow the big effects. In humans, those effects are due to rare mutations. In my next post I will explore how rare mutations can provide experimental entry points to elucidate the biological pathways leading from genetic risk to emergent psychopathology. In the meantime, this review from a few years ago outlines a framework for “following the genes”.


Popular posts from this blog

Undetermined - a response to Robert Sapolsky. Part 1 - a tale of two neuroscientists

Grandma’s trauma – a critical appraisal of the evidence for transgenerational epigenetic inheritance in humans

Undetermined - a response to Robert Sapolsky. Part 2 - assessing the scientific evidence