Exciting findings in schizophrenia genetics – but what do they mean?
A paper published today represents a true
landmark in psychiatric genetics. It reports results of a genome-wide
association study (GWAS) of schizophrenia, involving 36,989
cases and 113,075 controls. Assembling this sample required collaboration on a
massive scale, with over 300 authors involved. This huge sample gives
unprecedented statistical power to detect genetic variants that predispose to
disease, even if their individual effects on risk are tiny. The study reports
108 regions of the genome where genetic differences affect risk of disease.
This achievement is rightly being widely celebrated and reported, but what do
these results really mean?
GWAS look at sites in the genome where the particular
base in the DNA sequence is variable – it might sometimes be an “A”, other
times a “T”, for example. There are millions of such sites in the human genome
(which comprises over 3 billion bases of sequence). Each such site represents a
mutation that happened some time in the distant past, which has since been
inherited and spread throughout the population, while not supplanting the
previous version completely. This leaves some people with one version and some
with another – these different versions are thus called “common variants”.
[More correctly, since we each have two copies of each chromosome, each of us
carries two copies of each variable site, so the combined genotype could be AA,
AT or TT, in the example above].
The idea of a GWAS is to look across the entire genome
at over a million such variants for ones at higher frequency in disease cases
than in controls. That difference in frequency might be very minor (say, the
“A” version might be seen at a frequency of 30% in cases but 27% in controls),
but with such a huge sample size, that kind of variation can be statistically
significant. In epidemiological terms, the variant that is more common in cases
is termed a “risk factor” – if you have it, you are statistically more likely
to be in the case group than in the control group. (Just as smoking is more
common in people with lung cancer than in people without, although in that case
the difference in frequency is massive).
For any individual common variant, the increased
statistical risk is tiny – most increase risk by less than 1.1-fold. But the
idea is that the combined risk associated with a large number of such variants
could be quite large – large enough to push people into disease. Since the
variants are common, each of us will carry many of them, but some people will
carry more than others. This will generate a distribution of “risk variant
burden” across the population. If there are 108 sites, each in two copies, then
the range of that distribution could theoretically be from 0 to 216 risk
variants. The actual distribution is far narrower however, with the vast
majority of the population carrying somewhere between 90 and 130 risk variants
(assuming the relative frequencies of the two variants are around 50:50, on
average).
One way to conceptualise the combined effects of many variants is the “liability-threshold” model,
which suggests that though there is a smooth distribution of genetic burden (or
liability) across the population, only those above a certain threshold become
ill (say the top 1% in the case of schizophrenia). This is known as a polygenic
model of risk because it assumes the causal action of a large number of genes
in any individual.
An alternative model views common disorders such as
schizophrenia as arising mainly due to very rare mutations of large effect, but
in different genes in different individuals (and with the possibility of
modifying effects of other variants in the genetic background). This scenario
is known as genetic heterogeneity. Many such rare, high-risk mutations are known but the ones we currently know about collectively account for less than 10%
of cases of schizophrenia, e.g., here (and 15-30% of cases of autism).
So, with that as background, let’s consider what the
GWAS signals mean, individually and collectively. First, GWAS signals are a bit
like #Greenfieldisms: they point to a locus and they point to an increased
statistical risk of disease – that is all. This is because the common variant
that is interrogated is being used as a tag of wider genetic variation at that
locus (a locus is just a small region of the genome). Chromosomes tend to be
inherited in large chunks without too much mixing (or recombination) between
the two copies present in each parent. That means that one common variant at
one position will tend to be co-inherited with other common variants nearby.
The signal derived from GWAS is associated with one of those (or sometimes
several), but tags a lot of additional variation.
Generally, the presumption is that one of the common
variants is having a causal effect and the others are merely passengers.
However, there are also lots of rare mutations that come along for the ride.
These are mutations that arose much more recently and that are therefore
present in far fewer individuals. Though GWAS can’t see them directly, any such
mutation necessarily arises on the background of a particular set of common
variants (called a haplotype). Most people with that haplotype will not carry
the rare mutation, but it may be possible that several such mutations in the
population (if they are of large effect and thus found mainly in cases) can
give an aggregate signal that boosts the frequency of the common haplotype in
cases, resulting in a GWAS signal (driven by a “synthetic association”). Several
examples of such cases are now found in the literature, for other conditions (e.g., 1, 2, 3, 4),
though it is not clear if synthetic associations drive any of the signals in
the most recent schizophrenia study.
It is striking, however, that many of the loci
implicated by GWAS signals are known to sometimes carry rare mutations that
dramatically increase risk of disease. Some of the 108 loci implicated contain
only one gene, but some encompass many, while others have no gene in the region
or even nearby. Cases where the implicated gene is clear include genes like TCF4, CACNA1C, CACNB2, CNTN4, NLGN4X and
multiple others, where rare mutations are known to cause specific genetic
syndromes. Moreover, there is substantial enrichment in the GWAS loci for genes
in which rare mutations have been discovered in cases with schizophrenia, autism
or intellectual disability (including CACNA1I,
GRIN2A LRP1, RIMS1 and many others).
These findings strongly reinforce the validity of the
GWAS results and also suggest that many of the loci identified sometimes carry
rare, high-risk mutations that should be very informative for follow-up
mechanistic studies. Whether the GWAS signals themselves are driven by such
rare mutations in the samples under study is an open question. (Another paper
just out suggests that signals from the GRM3 locus, which encodes a
metabotropic glutamate receptor, may be driven by a rare variant that increases
risk of mental illness generally by about 2.7-fold). But there are also many
examples of loci where both rare and common variation is known to play a role
in disease risk and the GWAS signal could well be driven purely by common
variants with direct functional effects.
However, such effects need not be tiny in individuals,
even if their overall signal of increased risk across the population is very
small. We know of many examples of common variants that strongly modify the
effects of rare mutations, at the same locus or at one encoding an interacting
protein. In such cases, the common variant may increase risk of expression of a
disorder due to a rare mutation, but essentially have no effect in most of the
population who do not carry such a rare mutation. This situation is exemplified
by Hirschsprung’s disease, a condition affecting innervation of the gut. It can
be caused by rare mutations in any of 18 known genes. However, such mutations
do not always cause disease and the range of severity is also very wide. Common
variants at several of those same risk loci have been found to be much more
frequent in people with rare mutations who develop disease than in those with
the same mutations who remain healthy. When averaged across the population, as
in a GWAS study, such effects would yield only a tiny average increase in risk,
but this may reflect a large effect in a small subset of people and no effect
in the majority.
This brings us to a larger point – what do the GWAS
signals tell us collectively? More specifically, should they be taken as
evidence in support of a polygenic model of disease risk, where it is the
collective burden of common risk variants that causes the majority of disease
cases?
One way to test that is to model the variance of the
“liability” to the disease, which is actually an unmeasurable parameter, but
which is assumed to be normally distributed in the population. With that and a
number of other assumptions in place, one can then ask how much of the variance
in this trait is accounted for by the loci identified by the GWAS? The authors
state that a combined risk profile score “now
explains about 7% of variation on the liability scale to schizophrenia across
the samples”. That is an improvement over previous studies (the first 13
loci accounted for about 3%), but certainly not as much as might have been
expected under a purely polygenic model. Of course, it could just be that only a fraction of the contributing common variants have been found and that larger studies would identify more.
However, the GWAS data are also fully consistent with
a more complex model of genetic heterogeneity, which involves common variants
interacting with rare variants to determine individual risk. Population
averages of their effects remain just that – statistical measures that cannot
be applied to individuals. Even combining all the common variants to generate a
risk profile score does not generate a predictive measure of risk for
individuals. (One reason for that is that non-additive genetic interactions that
are likely highly important in individuals are averaged out by population-level
signals).
So, the current study points the finger at a large set
of new genes, but does not really discriminate between models of genetic
architecture. The overlap between the GWAS signals and the genes known to carry
rare, high-risk mutations certainly suggests that the GWAS has been successful
in identifying important risk loci - a tremendous advance for which the authors should be congratulated (as well as for their willingness to collaborate on this level). This is, however, just a first step in
understanding the biology of the disease. The underlying genetic heterogeneity presents
a tremendous challenge but also an opportunity, as individual high-risk
mutations can be followed up in functional studies to elucidate some of the
mechanisms through which a change in some piece of DNA can ultimately produce
the particular psychological symptoms of this often-devastating disease.
Comments
Post a Comment