A paper published today represents a true landmark in psychiatric genetics. It reports results of a genome-wide association study (GWAS) of schizophrenia, involving 36,989 cases and 113,075 controls. Assembling this sample required collaboration on a massive scale, with over 300 authors involved. This huge sample gives unprecedented statistical power to detect genetic variants that predispose to disease, even if their individual effects on risk are tiny. The study reports 108 regions of the genome where genetic differences affect risk of disease. This achievement is rightly being widely celebrated and reported, but what do these results really mean?
GWAS look at sites in the genome where the particular base in the DNA sequence is variable – it might sometimes be an “A”, other times a “T”, for example. There are millions of such sites in the human genome (which comprises over 3 billion bases of sequence). Each such site represents a mutation that happened some time in the distant past, which has since been inherited and spread throughout the population, while not supplanting the previous version completely. This leaves some people with one version and some with another – these different versions are thus called “common variants”. [More correctly, since we each have two copies of each chromosome, each of us carries two copies of each variable site, so the combined genotype could be AA, AT or TT, in the example above].
The idea of a GWAS is to look across the entire genome at over a million such variants for ones at higher frequency in disease cases than in controls. That difference in frequency might be very minor (say, the “A” version might be seen at a frequency of 30% in cases but 27% in controls), but with such a huge sample size, that kind of variation can be statistically significant. In epidemiological terms, the variant that is more common in cases is termed a “risk factor” – if you have it, you are statistically more likely to be in the case group than in the control group. (Just as smoking is more common in people with lung cancer than in people without, although in that case the difference in frequency is massive).
For any individual common variant, the increased statistical risk is tiny – most increase risk by less than 1.1-fold. But the idea is that the combined risk associated with a large number of such variants could be quite large – large enough to push people into disease. Since the variants are common, each of us will carry many of them, but some people will carry more than others. This will generate a distribution of “risk variant burden” across the population. If there are 108 sites, each in two copies, then the range of that distribution could theoretically be from 0 to 216 risk variants. The actual distribution is far narrower however, with the vast majority of the population carrying somewhere between 90 and 130 risk variants (assuming the relative frequencies of the two variants are around 50:50, on average).
One way to conceptualise the combined effects of many variants is the “liability-threshold” model, which suggests that though there is a smooth distribution of genetic burden (or liability) across the population, only those above a certain threshold become ill (say the top 1% in the case of schizophrenia). This is known as a polygenic model of risk because it assumes the causal action of a large number of genes in any individual.
An alternative model views common disorders such as schizophrenia as arising mainly due to very rare mutations of large effect, but in different genes in different individuals (and with the possibility of modifying effects of other variants in the genetic background). This scenario is known as genetic heterogeneity. Many such rare, high-risk mutations are known but the ones we currently know about collectively account for less than 10% of cases of schizophrenia, e.g., here (and 15-30% of cases of autism).
So, with that as background, let’s consider what the GWAS signals mean, individually and collectively. First, GWAS signals are a bit like #Greenfieldisms: they point to a locus and they point to an increased statistical risk of disease – that is all. This is because the common variant that is interrogated is being used as a tag of wider genetic variation at that locus (a locus is just a small region of the genome). Chromosomes tend to be inherited in large chunks without too much mixing (or recombination) between the two copies present in each parent. That means that one common variant at one position will tend to be co-inherited with other common variants nearby. The signal derived from GWAS is associated with one of those (or sometimes several), but tags a lot of additional variation.
Generally, the presumption is that one of the common variants is having a causal effect and the others are merely passengers. However, there are also lots of rare mutations that come along for the ride. These are mutations that arose much more recently and that are therefore present in far fewer individuals. Though GWAS can’t see them directly, any such mutation necessarily arises on the background of a particular set of common variants (called a haplotype). Most people with that haplotype will not carry the rare mutation, but it may be possible that several such mutations in the population (if they are of large effect and thus found mainly in cases) can give an aggregate signal that boosts the frequency of the common haplotype in cases, resulting in a GWAS signal (driven by a “synthetic association”). Several examples of such cases are now found in the literature, for other conditions (e.g., 1, 2, 3, 4), though it is not clear if synthetic associations drive any of the signals in the most recent schizophrenia study.
It is striking, however, that many of the loci implicated by GWAS signals are known to sometimes carry rare mutations that dramatically increase risk of disease. Some of the 108 loci implicated contain only one gene, but some encompass many, while others have no gene in the region or even nearby. Cases where the implicated gene is clear include genes like TCF4, CACNA1C, CACNB2, CNTN4, NLGN4X and multiple others, where rare mutations are known to cause specific genetic syndromes. Moreover, there is substantial enrichment in the GWAS loci for genes in which rare mutations have been discovered in cases with schizophrenia, autism or intellectual disability (including CACNA1I, GRIN2A LRP1, RIMS1 and many others).
These findings strongly reinforce the validity of the GWAS results and also suggest that many of the loci identified sometimes carry rare, high-risk mutations that should be very informative for follow-up mechanistic studies. Whether the GWAS signals themselves are driven by such rare mutations in the samples under study is an open question. (Another paper just out suggests that signals from the GRM3 locus, which encodes a metabotropic glutamate receptor, may be driven by a rare variant that increases risk of mental illness generally by about 2.7-fold). But there are also many examples of loci where both rare and common variation is known to play a role in disease risk and the GWAS signal could well be driven purely by common variants with direct functional effects.
However, such effects need not be tiny in individuals, even if their overall signal of increased risk across the population is very small. We know of many examples of common variants that strongly modify the effects of rare mutations, at the same locus or at one encoding an interacting protein. In such cases, the common variant may increase risk of expression of a disorder due to a rare mutation, but essentially have no effect in most of the population who do not carry such a rare mutation. This situation is exemplified by Hirschsprung’s disease, a condition affecting innervation of the gut. It can be caused by rare mutations in any of 18 known genes. However, such mutations do not always cause disease and the range of severity is also very wide. Common variants at several of those same risk loci have been found to be much more frequent in people with rare mutations who develop disease than in those with the same mutations who remain healthy. When averaged across the population, as in a GWAS study, such effects would yield only a tiny average increase in risk, but this may reflect a large effect in a small subset of people and no effect in the majority.
This brings us to a larger point – what do the GWAS signals tell us collectively? More specifically, should they be taken as evidence in support of a polygenic model of disease risk, where it is the collective burden of common risk variants that causes the majority of disease cases?
One way to test that is to model the variance of the “liability” to the disease, which is actually an unmeasurable parameter, but which is assumed to be normally distributed in the population. With that and a number of other assumptions in place, one can then ask how much of the variance in this trait is accounted for by the loci identified by the GWAS? The authors state that a combined risk profile score “now explains about 7% of variation on the liability scale to schizophrenia across the samples”. That is an improvement over previous studies (the first 13 loci accounted for about 3%), but certainly not as much as might have been expected under a purely polygenic model. Of course, it could just be that only a fraction of the contributing common variants have been found and that larger studies would identify more.
However, the GWAS data are also fully consistent with a more complex model of genetic heterogeneity, which involves common variants interacting with rare variants to determine individual risk. Population averages of their effects remain just that – statistical measures that cannot be applied to individuals. Even combining all the common variants to generate a risk profile score does not generate a predictive measure of risk for individuals. (One reason for that is that non-additive genetic interactions that are likely highly important in individuals are averaged out by population-level signals).
So, the current study points the finger at a large set of new genes, but does not really discriminate between models of genetic architecture. The overlap between the GWAS signals and the genes known to carry rare, high-risk mutations certainly suggests that the GWAS has been successful in identifying important risk loci - a tremendous advance for which the authors should be congratulated (as well as for their willingness to collaborate on this level). This is, however, just a first step in understanding the biology of the disease. The underlying genetic heterogeneity presents a tremendous challenge but also an opportunity, as individual high-risk mutations can be followed up in functional studies to elucidate some of the mechanisms through which a change in some piece of DNA can ultimately produce the particular psychological symptoms of this often-devastating disease.