Tuesday, July 22, 2014

Exciting findings in schizophrenia genetics – but what do they mean?

A paper published today represents a true landmark in psychiatric genetics. It reports results of a genome-wide association study (GWAS) of schizophrenia, involving 36,989 cases and 113,075 controls. Assembling this sample required collaboration on a massive scale, with over 300 authors involved. This huge sample gives unprecedented statistical power to detect genetic variants that predispose to disease, even if their individual effects on risk are tiny. The study reports 108 regions of the genome where genetic differences affect risk of disease. This achievement is rightly being widely celebrated and reported, but what do these results really mean?

GWAS look at sites in the genome where the particular base in the DNA sequence is variable – it might sometimes be an “A”, other times a “T”, for example. There are millions of such sites in the human genome (which comprises over 3 billion bases of sequence). Each such site represents a mutation that happened some time in the distant past, which has since been inherited and spread throughout the population, while not supplanting the previous version completely. This leaves some people with one version and some with another – these different versions are thus called “common variants”. [More correctly, since we each have two copies of each chromosome, each of us carries two copies of each variable site, so the combined genotype could be AA, AT or TT, in the example above].

The idea of a GWAS is to look across the entire genome at over a million such variants for ones at higher frequency in disease cases than in controls. That difference in frequency might be very minor (say, the “A” version might be seen at a frequency of 30% in cases but 27% in controls), but with such a huge sample size, that kind of variation can be statistically significant. In epidemiological terms, the variant that is more common in cases is termed a “risk factor” – if you have it, you are statistically more likely to be in the case group than in the control group. (Just as smoking is more common in people with lung cancer than in people without, although in that case the difference in frequency is massive).

For any individual common variant, the increased statistical risk is tiny – most increase risk by less than 1.1-fold. But the idea is that the combined risk associated with a large number of such variants could be quite large – large enough to push people into disease. Since the variants are common, each of us will carry many of them, but some people will carry more than others. This will generate a distribution of “risk variant burden” across the population. If there are 108 sites, each in two copies, then the range of that distribution could theoretically be from 0 to 216 risk variants. The actual distribution is far narrower however, with the vast majority of the population carrying somewhere between 90 and 130 risk variants (assuming the relative frequencies of the two variants are around 50:50, on average).

One way to conceptualise the combined effects of many variants is the “liability-threshold” model, which suggests that though there is a smooth distribution of genetic burden (or liability) across the population, only those above a certain threshold become ill (say the top 1% in the case of schizophrenia). This is known as a polygenic model of risk because it assumes the causal action of a large number of genes in any individual.

An alternative model views common disorders such as schizophrenia as arising mainly due to very rare mutations of large effect, but in different genes in different individuals (and with the possibility of modifying effects of other variants in the genetic background). This scenario is known as genetic heterogeneity. Many such rare, high-risk mutations are known but the ones we currently know about collectively account for less than 10% of cases of schizophrenia, e.g., here (and 15-30% of cases of autism).

So, with that as background, let’s consider what the GWAS signals mean, individually and collectively. First, GWAS signals are a bit like #Greenfieldisms: they point to a locus and they point to an increased statistical risk of disease – that is all. This is because the common variant that is interrogated is being used as a tag of wider genetic variation at that locus (a locus is just a small region of the genome). Chromosomes tend to be inherited in large chunks without too much mixing (or recombination) between the two copies present in each parent. That means that one common variant at one position will tend to be co-inherited with other common variants nearby. The signal derived from GWAS is associated with one of those (or sometimes several), but tags a lot of additional variation.

Generally, the presumption is that one of the common variants is having a causal effect and the others are merely passengers. However, there are also lots of rare mutations that come along for the ride. These are mutations that arose much more recently and that are therefore present in far fewer individuals. Though GWAS can’t see them directly, any such mutation necessarily arises on the background of a particular set of common variants (called a haplotype). Most people with that haplotype will not carry the rare mutation, but it may be possible that several such mutations in the population (if they are of large effect and thus found mainly in cases) can give an aggregate signal that boosts the frequency of the common haplotype in cases, resulting in a GWAS signal (driven by a “synthetic association”). Several examples of such cases are now found in the literature, for other conditions (e.g., 1, 2, 3, 4), though it is not clear if synthetic associations drive any of the signals in the most recent schizophrenia study.

It is striking, however, that many of the loci implicated by GWAS signals are known to sometimes carry rare mutations that dramatically increase risk of disease. Some of the 108 loci implicated contain only one gene, but some encompass many, while others have no gene in the region or even nearby. Cases where the implicated gene is clear include genes like TCF4, CACNA1C, CACNB2, CNTN4, NLGN4X and multiple others, where rare mutations are known to cause specific genetic syndromes. Moreover, there is substantial enrichment in the GWAS loci for genes in which rare mutations have been discovered in cases with schizophrenia, autism or intellectual disability (including CACNA1I, GRIN2A LRP1, RIMS1 and many others).

These findings strongly reinforce the validity of the GWAS results and also suggest that many of the loci identified sometimes carry rare, high-risk mutations that should be very informative for follow-up mechanistic studies. Whether the GWAS signals themselves are driven by such rare mutations in the samples under study is an open question. (Another paper just out suggests that signals from the GRM3 locus, which encodes a metabotropic glutamate receptor, may be driven by a rare variant that increases risk of mental illness generally by about 2.7-fold). But there are also many examples of loci where both rare and common variation is known to play a role in disease risk and the GWAS signal could well be driven purely by common variants with direct functional effects.

However, such effects need not be tiny in individuals, even if their overall signal of increased risk across the population is very small. We know of many examples of common variants that strongly modify the effects of rare mutations, at the same locus or at one encoding an interacting protein. In such cases, the common variant may increase risk of expression of a disorder due to a rare mutation, but essentially have no effect in most of the population who do not carry such a rare mutation. This situation is exemplified by Hirschsprung’s disease, a condition affecting innervation of the gut. It can be caused by rare mutations in any of 18 known genes. However, such mutations do not always cause disease and the range of severity is also very wide. Common variants at several of those same risk loci have been found to be much more frequent in people with rare mutations who develop disease than in those with the same mutations who remain healthy. When averaged across the population, as in a GWAS study, such effects would yield only a tiny average increase in risk, but this may reflect a large effect in a small subset of people and no effect in the majority.

This brings us to a larger point – what do the GWAS signals tell us collectively? More specifically, should they be taken as evidence in support of a polygenic model of disease risk, where it is the collective burden of common risk variants that causes the majority of disease cases?

One way to test that is to model the variance of the “liability” to the disease, which is actually an unmeasurable parameter, but which is assumed to be normally distributed in the population. With that and a number of other assumptions in place, one can then ask how much of the variance in this trait is accounted for by the loci identified by the GWAS? The authors state that a combined risk profile score “now explains about 7% of variation on the liability scale to schizophrenia across the samples”. That is an improvement over previous studies (the first 13 loci accounted for about 3%), but certainly not as much as might have been expected under a purely polygenic model. Of course, it could just be that only a fraction of the contributing common variants have been found and that larger studies would identify more.

However, the GWAS data are also fully consistent with a more complex model of genetic heterogeneity, which involves common variants interacting with rare variants to determine individual risk. Population averages of their effects remain just that – statistical measures that cannot be applied to individuals. Even combining all the common variants to generate a risk profile score does not generate a predictive measure of risk for individuals. (One reason for that is that non-additive genetic interactions that are likely highly important in individuals are averaged out by population-level signals).

So, the current study points the finger at a large set of new genes, but does not really discriminate between models of genetic architecture. The overlap between the GWAS signals and the genes known to carry rare, high-risk mutations certainly suggests that the GWAS has been successful in identifying important risk loci - a tremendous advance for which the authors should be congratulated (as well as for their willingness to collaborate on this level). This is, however, just a first step in understanding the biology of the disease. The underlying genetic heterogeneity presents a tremendous challenge but also an opportunity, as individual high-risk mutations can be followed up in functional studies to elucidate some of the mechanisms through which a change in some piece of DNA can ultimately produce the particular psychological symptoms of this often-devastating disease.

Tuesday, July 8, 2014

"Common disorders" are really collections of rare genetic conditions

Disorders such as autism, schizophrenia and epilepsy each affect about 1% of the population and are therefore defined as “common disorders”. But are they really? I mean, they are clearly really that common, but are they really “disorders”? Are they natural categories that reflect some shared underlying etiology or are they simply groupings based on sets of shared symptoms? Genetics is providing an answer to that question and demonstrating that so-called “common disorders” are really collections of rare disorders with similar symptoms. This represents a complete paradigm shift in psychiatry, the full ramifications of which have yet to be appreciated.
We have known for decades of examples of rare genetic syndromes that can include symptoms of autism spectrum disorder (such as Fragile X syndrome or Rett syndrome) or of schizophrenia (such as velo-cardio facial syndrome, now called 22q11 deletion syndrome), while epilepsy is a known symptom of many genomic disorders. But such examples were typically thought of as exceptional and distinct from the much larger group of idiopathic cases of ASD, SZ or epilepsy. (Idiopathic simply means of currently unknown cause). Such conditions were often dismissed as not “real autism” or “real schizophrenia”, despite the fact that clinicians could not make any such assessment based on symptoms alone.

Instead, it was widely held to be a proven fact that the genetics of ASD and SZ generally followed a very different mode – rather than being caused by single mutations, as with the syndromes mentioned above, the idea was that the idiopathic cases were caused by combinations of tens or hundreds (or even thousands) of minor genetic differences, each with only a tiny effect on its own, but collectively sufficient to result in disease if enough of them were inherited.

Modern genomic technologies are revealing that this supposed dichotomy between rare and common disorders is artificial – merely a reflection of our current state of knowledge (or, more correctly, our current state of ignorance). Over the past five years, researchers have discovered many more rare genetic conditions that manifest with psychiatric symptoms, and which collectively can account for an ever-growing percentage of patients presenting with ASD or SZ. These include deletions or duplications of whole chunks of chromosomes, often affecting many genes, as well as mutations that affect only one gene.
[The DNA sequence of each gene codes for production of a specific protein. Genes are strung along chromosomes, like the information encoding successive songs on a cassette tape. (I may be showing my age with this reference!) Localised damage to the tape at one specific point can affect just one song, but cutting out a whole section could remove or disrupt multiple songs at the same time. Similarly, changing one letter of the DNA sequence can alter the code for a single protein, while deleting a whole section of chromosome can remove multiple genes and thereby affect production of multiple proteins at once].

Some regions of the genome are particularly prone to errors in DNA replication that result in deletions or duplications. While still rare, these recur at a high enough frequency that many cases with effectively the same genetic lesion can be identified. This has enabled researchers to recognise and characterise a growing number of genomic disorders that carry a high risk of psychiatric or neurological symptoms. In addition to previously known conditions such as 22q11 deletion syndrome, Williams, Angelman and Prader-Willi syndromes, new conditions have been defined involving deletions or duplication at 1q21.1, 3q29, 7q36.2, 15q11.2, 16p11.2, 22q13 and many others, with more being recognised all the time.

All of these mutations have variable effects, sometimes presenting as ASD, sometimes as SZ or epilepsy – often, but not always, with developmental delay or intellectual disability. Because their clinical manifestations are so variable, there was no way to detect or recognise these patients prior to genetic screening (except for conditions with other characteristic symptoms, such as distinct facial morphology). But once a genetic diagnosis can be made, it becomes possible to group patients with the same mutation together and determine whether there are any patterns to their symptoms, their course of illness, how they respond to medications, and other clinical parameters. This is useful information for clinicians and also for patients and their families – indeed, international support groups have been formed for many of these rare genomic conditions.   

New conditions caused by mutations in specific genes are also being defined. Rett syndrome is a classic example – a form of autism and intellectual disability in girls that is caused by mutations in a gene called MeCP2. New genomic sequencing technologies are now revealing many more such conditions, although the pace of discovery here has been slower, for two reasons.

First, if you sequence the entire genome of any individual you will find many serious mutations – severely affecting production or function of a couple hundred proteins (out of ~20,000 in total). Recognising which one of those is causing disease in a particular patient is impossible, unless you have some prior information. That information can come from seeing the same gene mutated in multiple patients with a particular condition. That brings up the second problem – the number of genes in which mutations can cause ASD or SZ or epilepsy is very large, probably on the order of a thousand. So the likelihood that any two patients will have a mutation in the same gene is very low. This means we will need to sequence very large samples of patients to start to see the signal of meaningful repeat hits amongst the background noise of repeats that arise by chance, simply because we all carry many mutations.

Those efforts are underway and are beginning to pay off, with new conditions being defined at an ever-increasing rate. One recent example involves mutations in the gene CHD8. Mutations that disrupt this gene have been observed in multiple patients with diagnoses of developmental delay or ASD (15 independent mutations in 3,730 cases), but never in a sample of 8,792 clinically unaffected controls. You can see how rare these mutations are – accounting for only 4 of every 1000 cases – but the fact that you don’t see such mutations in controls provides strong evidence that they are in fact the cause of disease in those patients. (See here for a much more nuanced discussion of causality in genetic disorders – the phenotypic effects of any single mutation will always be modified, sometimes strongly, by additional genetic variants in the background). 

By finding multiple patients with mutations in the same gene, clinicians were able to define a new syndrome that was previously unrecognisable. In this case, patients with CHD8 mutations display macrocephaly (increased head size), distinct faces and gastrointestinal problems (the CHD8 protein has independent functions in both the brain and the nervous system innervating the gut). The genetic information is thus directly and immediately relevant to the clinical management and treatment of these cases.

Now, one might say that such mutations are so rare that they don’t really tell us anything about the generality of conditions like ASD or SZ. But the point is, there is no reason to think such a thing exists. As more and more mutations causing high risk of psychiatric conditions are discovered, the percentage of cases remaining idiopathic decreases. Those diagnostic categories are not founded on knowledge but on ignorance of underlying cause, by definition.

Known, high-risk mutations can now be identified in >10% of cases of SZ, 25-30% of cases of ASD, and over 60% of cases of severe intellectual disability. Those numbers represent a vast increase from even a few years ago and are sure to increase rapidly in the very near future. Even if the genetic effects in many cases are more complicated (involving more than one mutation at a time, with contributions from common variants), the major message remains the same: these conditions are incredibly genetically heterogeneous. It is probably far more appropriate to think of “autistic symptoms” or “schizophrenic symptoms” as a common consequence of many distinct genetic conditions, than to think of “autism” or “schizophrenia” as monolithic disorders.

That has hugely important implications not just for clinical practice but also for research. If you take a hundred patients with ASD, you might have 70-80 distinct genetic causes. That’s something to consider in the context of, say, neuroimaging studies that look for commonalities across groups of ASD or SZ patients. Any time I see a study reporting some difference in brain structure “in autism” or “in schizophrenia”, I replace that phrase with “in intellectual disability” and see if it still makes any sense. (It doesn’t, give the well-accepted heterogeneity of ID). Of course, there may be some commonalities in the final outcome in these patients, given they end up with similar symptoms, but research purporting to look at causes should bear the genetic heterogeneity in mind.

Genetics is increasingly providing the means to distinguish the underlying causes in different patients and hopefully develop a far more personalised approach to care. Fortunately, new technologies of genome editing are making it much easier to recapitulate disease-causing mutations in animals so that pathogenic mechanisms can be elucidated. Just in the past couple weeks, very exciting results have been published that help localise the primary effects of particular mutations (in the genes SYNGAP1 and NLGN3) to specific cell types in specific regions of the developing brain in mouse models. 

The recognition that these common diagnostic categories are really collections of very rare conditions will necessitate a shift in approaches aimed at developing new treatments. The economics of drug development for rare conditions are obviously very different from the search for the new blockbuster. The next big challenge is to elucidate the biological mechanisms leading to disease across many different mutations to determine if there are any shared pathways or common pathophysiological endpoints that might be targeted in large groups of patients or if individualised treatments can be (or need to be) developed for very small and specific sets of patients, as is happening in other areas of medicine.