Life after GWAS – where to next, for psychiatric genetics?
GWAS (genome-wide association studies) for
psychiatric illnesses may be about to become a victim of their own success. The
idea behind these studies is that common genetic variation – ancient mutations
that segregate in the population – may partly underlie the high heritability of
common psychiatric and neurological disorders, such as schizophrenia, autism,
epilepsy, ADHD, depression, and so on. The accumulating evidence from over ten
years of GWAS strongly supports that idea, with many hundreds of such risk
variants now having been identified. The problem is it’s not at all clear what
to do with that information.
GWAS are a method to carry out a kind of
genetic epidemiology, based on a simple premise – if a particular genetic
variant at some position in the genome (say an “A” base, as opposed to a “T” at position 236,456 on chromosome 9) – is associated with an increased risk
of some condition, then the frequency of the “A” version should be higher in
people with the condition than people without. (Just as the frequency of
smoking is higher in people with lung cancer than without).
If you examine the frequency of those kinds
of variants across the whole genome, in a large enough sample of people, you
can exhaustively search for any that confer risk to the condition, above a
certain effect size. The larger the sample, the smaller the effect on risk that
can be statistically detected. Initial GWAS for conditions like schizophrenia
came up empty. With samples in the low thousands, all that could be concluded
from such studies was that there were no common variants in the genome that
contributed even a moderate increase in risk, individually. (This isn’t
surprising, given the expectation that risk variants should be negatively
selected against).
But as samples sizes have grown, into many
tens of thousands, common variants have begun to be identified that are reliably
statistically associated with risk of the condition (i.e., that are slightly
more frequent among cases than among controls). There are now well over a
hundred specific variants that reach the stringent threshold for genome-wide
statistical significance for schizophrenia (here and here) and for depression. Each of these is
associated with only a very small statistical increase in risk for the
condition (usually on the order of 1.05-fold). This is almost negligible, but not quite.
In addition, below that significance
threshold, there are thousands more that seem likely to convey some of the
genetic risk for the condition, even if we can’t yet say so with statistical
confidence. Indeed, it is possible to estimate the collective effect on risk that all such common variants in the
genome might convey and this is typically sizable. By calculating “polygenic scores”, which take all of these putative associations into account, one can
determine where any given individual lies on an idealised continuum of risk. For
schizophrenia, these scores can explain about 5% of the variance in disease
status across the population in new test samples.
Polygenic
scores
Polygenic scores may not currently explain
much of the variance in these conditions but may nonetheless be useful for lots
of things. One is for examining the shared genetics of different traits or conditions.
For example, polygenic scores for intelligence are negatively correlated with
risk of a number of psychiatric conditions, including schizophrenia and ADHD. That’s
an interesting finding, that fits with the idea that intelligence may be partly
an indicator of general neurodevelopmental robustness.
Polygenic scores may also be useful in
experiments aimed at identifying possible environmental or experiential factors
contributing to mental illness, by allowing researchers to control for underlying genetic risk, which could otherwise obscure or confound other
effects.
Where they will probably not be very useful is in identifying
people at higher than average risk of disease. Well, let me rephrase that –
they may be useful at doing that, statistically speaking, but that information
may not be actionable, not for
psychiatric conditions at least.
This is for several reasons: first, the
scores will only ever capture a small portion of the genetic risk of the
condition. This is because: (i) un-captured rare mutations also make an
important contribution to risk for many psychiatric disorders; and (ii) the specific combinations of risk variants present in
any individual may be more important than just the additive burden. These
effects will never be captured by signals that have been averaged across the
whole population.
Second, genetics only confers a portion of
the overall variance in risk – the heritability of psychiatric disorders ranges
from 20-70%, meaning a sizable fraction of the variance in risk is non-genetic
in origin. There is a tendency to think if it’s not genetic, it must be
environmental, but this is a mistake – much of this non-genetic variation may
be due to the inherent randomness of the cellular processes of brain
development.
For these reasons, the predictive precision
of polygenic scores will always be low for individuals – this is a limit in
principle, not just in practice. A statistical prediction might be useful
enough to be acted on by insurance companies, or even in pre-implantation genetic screening if people are inclined to use it, but won’t carry much
actionable information for doctors treating individual patients. In any case,
for most such conditions, there are few preventive measures that can be taken,
beyond generally looking after oneself – moderating alcohol use, staying away
from hard drugs, avoiding smoking, reducing stress, better exercise, diet, and
sleep habits. People don’t get prescribed antidepressants or antipsychotics
prophylactically, for good reasons.
So, for psychiatric conditions, polygenic
scores will likely be useful for some kinds of research, but less so for
clinical purposes.
What
about the biology?
One thing that GWAS for psychiatric
disorders have not been that useful
for is elucidating the underlying biology – or at least not in the way it was
hoped. One of the driving motivations of GWAS was the idea that associated
variants would implicate specific genes or biochemical pathways in the
pathogenesis of psychiatric disease, possibly even providing direct molecular
targets for new therapeutics. This has turned out not to be the case.
It’s not that GWAS haven’t identified
anything – just the opposite – they’ve identified too much. If we take
schizophrenia as an example, GWAS have highlighted hundreds of variants that
implicate nearby genes, encoding proteins with very diverse functions.
Among that list of genes, there is an
enrichment for ones with functions in various neurodevelopmental processes or
functions at neuronal synapses, for genes expressed in the brain, especially
the fetal brain, for genes with greater expression in broad brain areas like
the cerebral cortex, or for genes with greater expression in very broad classes
of cell types, like glutamatergic or GABAergic neurons.
Collectively, these data suggest that
something about some neurons in some parts of the developing brain goes awry in
schizophrenia, somehow.
Now, it’s important to note that that’s not
nothing. At the very least, it provides strong evidence that the GWAS signals
are real – they didn’t land on genes for liver enzymes or connective tissue
proteins or eye colour pigments. But these findings don’t tell us much more,
biologically, than we knew before – that schizophrenia is a disorder of
disturbed neural development.
Even worse, a new model of “omnigenic”
inheritance suggests that every gene
that is expressed in the relevant tissue for a given disease (the brain in this
case) will harbor some genetic variants that are statistically associated with
disease risk. Larger and larger GWAS will no doubt uncover more and more of
these statistical associations, though the effect sizes will get smaller and
smaller. Beyond a certain stage, it seems reasonable to ask: “What’s the
point?”
It seems clear now that GWAS will not, by
themselves, implicate very specific pathways or directly yield new mechanistic
insights into the nature of the associated conditions. This is especially true
for psychiatric conditions, because, while they may have genetic origins, they
do not have proximal genetic mechanisms.
Unlike a condition like cancer, which really reflects an altered state of gene
expression at a cellular level, psychiatric conditions like schizophrenia do
not. They reflect altered states of distributed brain circuits and systems. Genetic
variation may cause such a state to emerge, but there may be nothing in the
molecular functions of the affected genes that specifically relates to that
state in any acute or on-going fashion.
For cancer, if discovering risk genes is
part A, and B is the resultant phenotype, you can go from A to B in one step.
For psychiatric disorders, you need to go from A to Z, where the phenotype, Z,
is not directly resultant, but very indirectly emergent, through dynamic interactions between hundreds of
different cell types across distributed circuits over long periods of time, as
development and maturation play out.
It will therefore take experimentally
tractable model systems to work out the chain of steps from A to Z. Characterising
the properties of the genes themselves (by cross-referencing with other
large-scale “omics” datasets, for example) just won’t do it. And the problem
with the variants implicated by GWAS is that they simply don’t provide an
experimental handle for those kinds of follow-up experiments.
From
analysis to experiment
In the first instance, the genetic variant
assayed in GWAS is just a marker – the actual functional variant will tend to
be co-inherited with that marker but it may take some effort to find it. And
then one has to figure out which gene it is affecting and how. Sometimes it’s
the nearest gene, but other times the affected gene may be some distance away.
And most often the effect will be just a small change in expression level,
possibly only in some cell type or other. It is possible that you could find
some cellular process where that small change in expression makes an obvious
difference in an experimental assay. But you could spend a long time looking
for what that process might be, for any given gene.
Moreover, even if you did find some
cellular process that was affected by a modest expression change in Gene X,
you’d still be many, many steps away from understanding how a change in that
process could ultimately contribute (a tiny amount) to an increase in risk for
a condition like schizophrenia. And there seems no way to bridge that gap. The
effect on risk of any given variant alone is simply too small to be elucidated in
this way.
For example, one of the variants associated
with risk of schizophrenia tags a gene encoding the C4 complement protein. This
protein is involved in the immune system, but also, as it turns out, in pruning
surplus synapses in the developing nervous system. An impressive paper tracked down the causal functional variants in the gene, which
turn out to relate to how many copies of the protein-coding sequence are
present. The differences in copy number are correlated with overall expression
level of the protein in the brain.
But what effect does this have? The authors
turned to mice as a model system to investigate this – kind of. They showed
that complete removal of both copies of the gene affects the pruning of
synapses in a specific part of the developing mouse brain. Now, that’s very
nice and all (really beautiful work in fact), but it’s actually just telling us
what the normal function of the encoded protein is (by completely removing it),
not what the effect might be of a modest increase in expression, which was what
the functional risk variants were associated with.
The problem is that the analyses in the
mouse are completely incommensurate with the effect sizes in humans – there’s
just no useful way to relate them. It’s not like we can really infer that excessive
synaptic pruning underlies schizophrenia – not when this is just an arbitrarily
chosen one of hundreds of genes harboring variants that collectively contribute
only a modest proportion of the overall risk of the condition. If there were
massive convergence onto this biochemical pathway or cellular process that
would be one thing, but there is not.
We could follow the same approach for
hundreds of other common risk variants and be none the wiser as to the biology
of schizophrenia. We’d end up with lots of inferences and no way to test them.
But if we can’t make much headway
investigating these common variants individually, maybe we can figure out what
their collective effects are. Well, maybe, but to move beyond correlative
analyses in human subjects we would need to recapitulate these high-risk
polygenic profiles in some experimental system. That won’t be possible in
animals. Perhaps the only way to do it is to generate induced pluripotent stem (iPS)
cells from people with high versus low polygenic risk scores.
These iPS cells can then be turned into
neurons, or even “minibrains” (cerebral organoids) in a dish and the effects of
the polygenic burden can be assayed. This could be a powerful method, in
theory. In practice, there are a few problems with its application to
psychiatric disorders.
First, what phenotype do you look at? For a
condition like microcephaly, minibrains are perfect as they recapitulate early stages of brain development really well, where processes of neuronal proliferation
can be directly modelled. For psychiatric conditions, the defect is more likely
in subtle aspects of connectivity, including activity-dependent refinement over
months or years. You might get at early stages of synapse formation in a
minibrain, which could prove very interesting, but the emergence of the
relevant brain-wide pathophysiological states will prove much harder to model
in a dish. If you want to understand the full chain of events leading to brain
dysfunction, you’re going to need a real brain, in a behaving animal.
Second, the actual variants contributing to
a high-risk polygenic profile will differ between any two people. They will be
an overlapping but unique subset of all the risk variants in the population.
Who knows if the effects will converge onto any particular cellular process?
They might, but they might just as well not. It seems likely that insults to
many different primary processes might lead to convergence on a particular
pathophysiological state (as is the case with epilepsy, for example).
Third, we all carry hundreds of rare
mutations, in addition to our polygenic burden of common variants. These are
especially important for neuropsychiatric conditions. Differences in relevant
phenotypes between cells from any two individuals (especially if one has a
disorder and the other does not) could just as well – indeed, more likely – be due
to those unknown rare variants than to the polygenic profile.
Polygenic
burden degrades general robustness
The best way to think of polygenic burden
may be not so much as affecting any particular process so much as reducing the
robustness of all processes. The genome has evolved to buffer many insults –
environmental variation, molecular noise, and genetic variation. This robustness
allows genetic variation to accumulate in the population if the individual
mutations are not too severe. However, these accumulating variants collectively
degrade the robustness of the system, by compromising the evolved interactions
of all the components. The higher a person’s polygenic burden of risk variants,
the lower their ability to buffer additional insults.
Under this model, the polygenic profile may
not cause disease by itself, even at the highest end of the distribution of
common risk variant burden. Instead, it may make it harder to buffer the
effects of rare mutations, acting as a strong genetic modifier to determine
whether a disorder results or not.
There is a variety of evidence to support
this view. First, polygenic risk for various psychiatric disorders is highly
overlapping, consistent with a general vulnerability, not related to the specific symptoms of any diagnostic
category.
Second, as mentioned above, the polygenic signal for intelligence is negatively correlated with risk of schizophrenia, ADHD, and several other psychiatric
conditions (and vice versa). This is likely not because being intelligent is
protective, per se, but rather because intelligence is an index of neurodevelopmental robustness. Finally, recent studies have shown that the
background of common variants acts as a modifier of the severity of the effects
of rare mutations associated with very specific Mendelian conditions.
This view is equivalent to the well known
genetic background effects commonly seen for many kinds of phenotypes (especially
behavioral ones) when mutations are crossed into different strains or lines of
mice or flies. These effects can be quite sizable – sometimes causing very
different outcomes in different strains. However, they are extremely hard to
study, and it isn’t obvious that you would really learn much from working out their
underlying mechanisms in detail.
Follow
that gene!
Indeed, the lesson from model organisms is
that if you want to understand biology, you should follow the big effects. In
humans, those effects are due to rare mutations. In my next post I will explore
how rare mutations can provide experimental entry points to elucidate the
biological pathways leading from genetic risk to emergent psychopathology. In
the meantime, this review from a few years ago outlines a framework for
“following the genes”.
Comments
Post a Comment