If genomics is the answer, what's the question? A commentary on PsychENCODE
There was much excitement in the press and in the psychiatric research community recently as a flurry of papers was published, presenting the work of the PsychENCODE project. This project, involving the work of many labs, aimed to deploy the powerful tools of genomics to dissect the landscape of gene regulation in the human brain, with the ultimate goal of revealing the molecular underpinnings of psychiatric disease.
Genome-wide association studies (GWAS) have revealed hundreds of common genetic variants that are statistically associated with increased risk of psychiatric disorders, such as schizophrenia, ADHD, bipolar disorder, and, to a lesser extent, autism. What they have not revealed is how such variants increase risk of disease. The PsychENCODE project aimed to generate a set of data that would allow researchers to answer that question.
There are a number of challenges in going from identification of an associated risk variant to elucidation of its biological effects. First, the common variants that are actually assayed are just markers – they tag a bunch of other genetic variants that tend to be co-inherited with them in little segments of the chromosome. Identifying the actual causal variant in that segment is not so straightforward.
Second, and relatedly, many of those variants do not occur in the coding region of a gene – they do not change the amino acid sequence of a protein. Instead, they alter the DNA sequence of a regulatory region – a piece of DNA that acts as a binding site for proteins that regulate the expression of a nearby gene. Or at least it was always assumed that a nearby gene would be the most likely target of regulation. It turns out that many regulatory regions regulate genes that are some distance away on the chromosome, via three-dimensional interactions between proteins bound to different, often distant sites.
Third, even if one can find the causal variant and the gene it regulates, analysing the effects of the variation on gene expression is not so easy – especially in the brain. What you’d like to do is directly measure expression of the gene – levels of the mRNA transcript (or relative levels of multiple different transcripts) or of the protein itself, in the relevant cell types – in people with the risk variant versus those without. As brain tissue is inaccessible in life, people have tended to use blood cells as a proxy, but this has obvious and serious limitations, in that gene expression patterns in blood differ substantially from brain.
Finally – and this is the big one – even if we catalogue all the changes in gene expression associated with all the associated risk variants, understanding how these changes collectively contribute to the emergence of a psychiatric disorder, with some specific profile of attendant cognitive, perceptual and behavioural symptoms in any individual, remains an enormous challenge.
The PsychENCODE project generates and analyses a huge amount of data that goes some way to addressing the first three challenges. The results presented are certainly an important advance compared to our previous knowledge, which was largely inferred from studies in animals. The papers look in detail at gene expression in embryonic and fetal human development, in cerebral organoids in culture, in parts of the adult human cortex (post mortem), and, for comparison, in the macaque brain. (As well as other topics). They cross-correlate levels of particular transcripts and three-dimensional chromosomal contacts with underlying genetic variants to give a much better picture of which variants are functionally important and how various kinds of genes are regulated in the brain.
These are all useful and informative data about the landscape of variation in gene expression in the human brain. But they leave almost untouched the final challenge – understanding how genetic risk ultimately causes pathophysiology and psychopathology. It is striking, in fact, that this question is completely glossed over in the set of papers and most of the accompanying commentary.
Examining the hypothesis
The PsychENCODE papers frame psychiatric disorders as problems of altered gene expression. That is the implicit rationale for the entire approach – that if we could figure out the profiles of altered gene expression, we would better understand the nature of the conditions. But it is not clear at all what the actual hypothesis is that they aim to test, nor are the underlying premises of the genomic approach made explicit or critically examined. If genomics is the answer, what is the question?
The lack of a firm conceptual footing is clearest in the studies looking at patterns of gene expression in post mortem brain samples from patients with psychiatric illness.
The idea seems to be that we can find in the brains of adult sufferers of conditions like schizophrenia some recognisable and consistent pattern of gene expression that will reveal the biology underlying the condition or its symptoms.
There are multiple technical and conceptual problems with this approach:
1. Clinical heterogeneity: Schizophrenia is not a thing. It is a collection of things. Like most psychiatric categories, it is a diagnosis of exclusion – if you show some of a set of symptoms, and if doctors can’t find any specific “organic” cause for them, then you may get labelled with “schizophrenia”, but really it is a placeholder. It basically says “we don’t know what’s wrong with you, but you look like these other folks over here and so we’re going to give you all this label, for now, until we figure out some better way to discriminate between you all”.
In addition, two patients may both get a diagnosis of schizophrenia but not share a symptom in common. Bipolar disorder, autism, and other psychiatric diagnoses are likewise not natural kinds, showing tremendous clinical heterogeneity in symptom profiles and course of illness. When we are looking at the post mortem brain of a patient, are we expecting to see a profile of gene expression that relates to having the condition (the trait) or to the particular symptoms they were having, either chronically or acutely at the time of their death (the state)?
2. Genetic heterogeneity: All of these categories encompass tremendous genetic heterogeneity. This is most obvious in autism, where rare mutations with large individual effects can be discovered for many patients. But it is true for schizophrenia and bipolar disorder too, with probably a greater role for a load of weaker rare mutations. In all cases, the polygenic background of common variation plays an important part in determining the eventual phenotype. But the particular profile of common risk variants will also be heterogeneous between different patients.
This underlying heterogeneity is completely overlooked by the PsychENCODE papers. If there are so many, and such diverse, underlying genetic origins, why should we expect convergent profiles of gene expression across patients?
3. Regional heterogeneity. The Gandal et al study looks at gene expression in the prefrontal cortex and the temporal cortex. Why? Is this the locus of these conditions? Are other parts of the brain not also affected? Could altered gene expression in, say, dopaminergic neurons in the midbrain not be involved in schizophrenia, for example?
4. Cellular heterogeneity: These studies take what I call the “melon-baller” approach – they scoop out a little chunk of the brain and analyse gene expression within it. But each chunk is made up of hundreds of different cell types, and each cell type will have its own characteristic profile of gene expression. Any difference in average gene expression levels between bits of prefrontal cortex in patients and controls will most likely represent a change in cellular composition.
Is this what we expect underlies the symptoms of these conditions? More or fewer excitatory or inhibitory neurons or astrocytes or microglia or oligodendrocytes? This might be a useful method to detect neurodegeneration or inflammation or gliosis, but there is not really any evidence or reason to suspect that these kinds of processes are key causal factors. There may well be some inflammation or other changes as consequences of these conditions and the lifetime of altered behaviour that accompanies them, but that is quite another matter.
5. Developmental origins: These psychiatric conditions are neurodevelopmental disorders. Many lines of converging evidence support the idea that the symptoms have their origins in altered processes of neural development, and the genes that have been implicated to date are enriched for those involved in these processes of neural development. That means that the genetic profiles of interest (if there are any) would be the ones during development, not in adults. Many genes show very different profiles of expression in development versus adults – some switch off entirely, others become much more ubiquitous than at early stages. Patterns in adults are thus not a good guide to possible differences in development.
Looking in post mortem samples may therefore be many decades too late to detect altered gene expression of interest.
If all you've got is a hammer...
So, these studies are founded on only the vaguest and most general premise: that there should exist some kind of convergent pattern of altered gene expression across cell types in certain regions of the brain in adult patients who all have been given the same clinical diagnosis, and that such a change in gene expression in the adult brain underlies and in some way can explain the symptoms.
But consideration of the issues described above shows there is really no logical justification for this expectation. In fact, it ignores everything we know about these conditions. Yes, they have genetic origins. That does not mean they have proximal molecular underpinnings. They have neural underpinnings.
We can’t explain hallucinations or delusions in terms of patterns of gene expression. We can’t even explain the neural pathophysiological states underlying such symptoms in terms of patterns of gene expression. It’s just the wrong level. These states emerge from the trajectories and dynamics of the neural system, over decades of development and experience. Genetic insults at the start may increase the probability of one trajectory or another but they do not underlie or explain the emergent states in any kind of direct or informative way.
Though they are not presented in this way, the results from the PsychENCODE papers strongly support this conclusion. They present many statistically significant findings – mostly of a single type: “enrichment” of genes from one list (e.g., GWAS hits) among genes on another list (e.g., those with expression affected in prefrontal cortex, or those associated with some particular cell type more than others, or those showing transcript splice form alterations, or those showing chromatin configuration differences, etc.).
There is such a wealth of data here that these kinds of “Omomics” analyses can generate many positive findings, in terms of statistical significance. But what do they mean? Even by the most charitable reading, it is hard to see how any of the results presented have advanced our understanding of the biology of these conditions. While we certainly should not have expected these first analyses to reveal everything right away, we might reasonably have expected them to reveal something.
Simply put, nothing really definitive comes out. In fact, the strongest result from these studies is a general one, and it is “negative”: there are no convergent patterns of gene expression in adult brain that characterise these various psychiatric conditions. That could have been predicted from the discussion presented above, but at least we can now say it has strong empirical support.
Now, defenders of this approach might counter by saying that the first tranche of papers simply present this huge dataset that can now be analysed by many others and that can generate new hypotheses for further study. And there is no doubt that they will generate many more papers.
But it is not clear to me that they actually do generate new hypotheses – not ones that are experimentally testable at any rate. The problem relates to concentrating on the common variant, polygenic component of risk. This involves so many variants, each with such a minuscule effect, that it is almost impossible to follow up on in experimental systems (as discussed here).
I guess if all you have is a hammer, everything looks like a nail. But ultimately, these psychiatric disorders are not a problem genomics alone can solve. At some stage we have to hand the problem over to neuroscientists. This means giving them something they can work with experimentally – not just lists of genes compared to other lists of genes.
My own bet is on rare mutations with large effects, where it may be much more feasible to identify strong biological effects and follow the trajectory of events that leads from altered development or function of some particular cell types and circuits in the developing brain to the ultimate emergence of particular pathophysiological states. (Not that that approach doesn’t have its own challenges!)