Thursday, December 3, 2015

On literature pollution and cottage-industry science

A few days ago there was a minor Twitterstorm over a particular paper that claimed to have found an imaging biomarker that was predictive of some aspect of outcome in adults with autism. The details actually don’t matter that much and I don’t intend to pick on that study in particular, or even link to it, as it’s no worse than many that get published. What it prompted, though, was more interesting – a debate on research practices in the field of cognitive neuroscience and neuroimaging, particularly relating to the size of studies required to address some research questions and the scale of research operation they might entail.

What kicked off the debate was a question of how likely the result they found was to be “real”; i.e., to represent a robust finding that would replicate across future studies and generalise to other samples of autistic patients. I made a fairly uncompromising prediction that it would not replicate, which was based on the fact that the finding derived from: a small sample (n=31, in this case, but split into two), an exploratory study (i.e., not aimed at or constrained by any specific hypothesis, so that group differences in pretty much any imaging parameter would do) and lack of a replication sample (to test directly, with exactly the same methodology, whether the findings from the study were robust, prior to bothering anyone else with them).

The reason for my cynicism is twofold: first, the study was statistically under-powered, and such studies are theoretically more likely to generate false positives. Second, and more damningly, there have been literally hundreds of similar studies published using neuroimaging measures to try and identify signatures that would distinguish between groups of people or predict the outcome of illness. For psychiatric conditions like autism or schizophrenia I don’t know of any such “findings” that have held up. We still have no diagnostic or prognostic imaging markers, or any other biomarkers for that matter, that have either yielded robust insights into underlying pathogenic mechanisms or been applicable in the clinic.

There is thus strong empirical evidence that the small sample, exploratory, no replication design is a sure-fire way of generating findings that are, essentially, noise.

This is by no means a problem only for neuroimaging studies; the field of psychology is grappling with similar problems and many key findings in cell biology have similarly failed to replicate. We have seen it before in genetics, too, during the “candidate gene era”, when individual research groups could carry out a small-scale study testing single-nucleotide polymorphisms in a particular gene for association with a particular trait or disorder. The problem was the samples were typically small and under-powered, the researchers often tested multiple SNPs, haplotypes or genotypes but rarely corrected for such multiple tests, and they usually did not include a replication sample. What resulted was an entire body of literature hopelessly polluted by false positives.

This problem was likely heavily compounded by publication bias, with negative findings far less likely to be published. There is evidence that that problem exists in the neuroimaging literature too, especially for exploratory studies. If you are simply looking for some group difference in any of hundreds or thousands of possible imaging parameters, then finding one may be a (misplaced) cause for celebration, but not finding one is hardly worthy of writing up.

In genetics, the problems with the candidate gene approach were finally realised and fully grappled with. The solution was to perform unbiased tests for SNP associations across the whole genome (GWAS), to correct rigorously for the multiple tests involved, and to always include a separate replication sample prior to publication. Of course, to enable all that required something else: the formation of enormous consortia to generate the sample sizes required to achieve the necessary statistical power (given how many tests were being performed and the small effect sizes expected).

This brings me back to the reaction on Twitter to the criticism of this particular paper. A number of people suggested that if neuroimaging studies were expected to have larger samples and to also include replication samples, then only very large labs would be able to afford to carry them out. What would the small labs do? How would they keep their graduate students busy and train them?

I have to say I have absolutely no sympathy for that argument at all, especially when it comes to allocating funding. We don’t have a right to be funded just so we can be busy. If a particular experiment requires a certain sample size to detect an effect size in the expected and reasonable range, then it should not be carried out without such a sample. And if it is an exploratory study, then it should have a replication sample built in from the start – it should not be left to the field to determine whether the finding is real or not.

You might say, and indeed some people did say, that even if you can’t achieve those goals, because the lab is too small or does not have enough funding, at least doing it on a small scale is better than nothing.

Well, it’s not. It’s worse than nothing.

Such studies just pollute the literature with false positives – obscuring any real signal amongst a mass of surrounding flotsam that future researchers will have to wade through. Sure, they keep people busy, they allow graduate students to be trained (badly), and they generate papers, which often get cited (compounding the pollution). But they are not part of “normal science” – they do not contribute incrementally and cumulatively to a body of knowledge.

We are no further in understanding the neural basis of a condition like autism than we were before the hundreds of small-sample/exploratory-design studies published on the topic. They have not combined to give us any new insights, they don’t build on each other, they don’t constrain each other or allow subsequent research to ask deeper questions. They just sit there as “findings”, but not as facts.

Lest I be accused of being too preachy, I should confess to some of these practices myself. Several years ago, while candidate gene studies were still the norm, we published a paper that included a positive association of semaphorin genes with schizophrenia (prompted by relevant phenotypes in mutant mice). It seems quite likely now that that association was a false positive, as a signal from the gene in question has not emerged in larger genome-wide association studies.

And the one neuroimaging study I have done so far, on synaesthesia, certainly suffered from a small sample size (at the time it was considered decent), and no replication sample. In our defense, our study was itself designed as a replication of previous findings, combining functional and structural neuroimaging. While our structural findings did mirror those previously reported (in general direction and spatial distribution of effects, though not precise regions), our functional results were quite incongruent with previous findings. As we did not have a replication sample built into our own design, I can’t be particularly confident that our findings will generalise – perhaps they were a chance finding in a fairly small sample. (Indeed, the imaging findings in synaesthesia have been generally quite inconsistent and it is difficult to know which findings constitute real results that future research studies could be built on).

If I were designing these kinds of studies now I would use a very different design, with much larger samples and in-built replication (and pre-registration). If that means they are more expensive, so be it. If it means my group can’t do them alone, well that’s just going to be the way it is. No one should fund me, or any lab, to do under-powered studies.

For the neuroimaging field generally that may well mean embracing the idea of larger consortia and adopting common scanning formats that enable combining subjects across centres, or at least subsequent meta-analyses. And it will mean that smaller labs may have to give up on the idea of making a living from studies attempting to find differences between groups of people without enough subjects. You’ll find things – they just won’t be real.

Sunday, November 22, 2015

What do GWAS signals mean?

Genome-wide association studies (GWAS) have been highly successful at linking genetic variation in hundreds of genes to an ever-growing number of traits or diseases. The fact that the genes implicated fit with the known biology for many of these traits or disorders strongly suggests (effectively proves, really) that the findings from GWAS are “real” – they reflect some real biological involvement of those genes in those diseases. (For example, GWAS have implicated skeletal genes in height, immune genes in immune disorders, and neurodevelopmental genes in schizophrenia).

But figuring out the nature of that involvement and the underlying biological mechanisms is much more challenging. In particular, it is not at all straightforward to understand how statistical measures derived at the level of populations relate to effects in individuals. Here, I explore some of the diverse mechanisms in individuals that may underlie GWAS signals.

GWAS take an epidemiological approach to identify genetic variants associated with risk of disease in exactly the same way epidemiologists identify environmental factors associated with risk – they look for factors that are more frequent in cases with a disease than in unaffected controls. For example, smoking is more common in people with lung cancer than in people without lung cancer (even though only a minority of people who smoke get lung cancer). From this we can deduce that smoking may be a risk-modifying factor for lung cancer, and we can measure the strength of that effect. Of course, observational epidemiology cannot prove causation – but it can provide important clues as to the risk architecture of a disease.

For GWAS, the factors in question are not environmental – they are the differences in our DNA that exist at millions of positions across the genome. These “single-nucleotide polymorphisms”, or SNPs, are positions in the genome where the DNA sequence varies between people – sometimes it might be an “A”, sometimes it might be a “T” (or a “G” or a “C”). Of course, any position in the genome can be mutated and likely is mutated in someone on the planet, but such mutations are typically extremely rare. SNPs are different – they are positions where two different versions are both relatively frequent in the population; these versions are thus often referred to as common variants.

GWAS are premised on the simple idea that if any of those common variants at any of those millions of SNPs across the genome is associated with an increased risk of disease, then that variant should be more frequent in cases than in controls. So, if we find variants that are more common in cases than in controls, we can infer that these variants may be causally related to an increased risk of disease.

What that doesn’t tell us is how. How does having one variant over another at that particular site cause an increased risk of that particular disease? I don’t just mean by what biological mechanism; I mean how does risk calculated at the population level relate to effects in individuals?

Statistically, we get two measures out of GWAS for any SNP that is associated. One is the p-value, which is a measure of how unlikely it would be to see a frequency difference of the magnitude we observe, just by chance. You might, for example, find that the “A” version at one SNP is at 25% frequency in controls but 28% frequency in cases. That’s not a big difference, so you’d need a very big sample to make sure it wasn’t noise, which is precisely why GWAS now use sample sizes of tens or even hundreds of thousands of people.

GWAS also apply very rigorous thresholds for statistical significance, in order to correct for the fact that they are testing so many different SNPs. (This follows the logic that, while it is quite unlikely that you will win the lottery yourself, if enough tickets are sold, it won’t be surprising if the lottery is won by somebody). These methods have greatly advanced the trustworthiness of results from the field, far beyond those reported in the benighted “candidate gene era”. But the p-value doesn’t tell us anything about how big of an effect there is – how much of an effect on risk does the difference in frequency between cases and controls reflect?

That number is summarised by the other measure we get for each associated SNP, which is the odds ratio. This reflects the size of the difference in frequency of that variant between cases and controls. It is calculated very simply: say your SNP comes in two versions, or “alleles”: “A” and “G”. We want to convert the difference in absolute frequencies in cases versus controls (say 28% vs 25%, or 62% vs 60%, or whatever it is) into a number that tells us how many times more common is one version in cases versus controls. (The reason is that that number is more easily related to the increased risk associated with having that version).

Here’s an example: If we take 28% and 25% as frequencies of the “A” allele at a certain SNP in cases and controls, respectively, then if you were to select an “A” allele at random from the sample, the odds of it coming from a case versus a control is 0.28/0.25 (=1.12). The odds of the alternative “G” allele occurring in a case versus a control is correspondingly lower: 0.72/0.75 (=0.96). The odds ratio is then 1.12/0.96 = 1.167.  Assuming that the cases and controls are representative of the general population, we can infer that individuals with an “A” allele are 1.167 times more likely to be a case, compared to those with the “G” allele, which is the number we’re after. (Note that this approximation of odds ratio to relative risk only holds when the disease is rare).

If you do the same calculations for 62% vs 60% it works out to 1.09. These odds ratios are on the order of the typical values obtained from GWAS. For comparison, the odds ratio for smoking and lung cancer is around 30. It is calculated in the same way, e.g., from data like these from a study in Spain in the 1980’s (where smoking was apparently astronomically common!): this study found that 98.8% of lung cancer patients were smokers, while “only” 80.3% of controls were smokers. Doing the same calculations as above gives an OR = 29.1, which is consistent with many other studies.

Thus, for either genetic or environmental factors, the odds ratio gives an average increased risk of disease. But, biologically, what is actually going in each individual that collectively gives that signal?

The most straightforward interpretation is that an odds ratio of, say, 1.2 at the population level reflects exactly the same thing at the individual level – each individual who inherits that SNP variant is at 1.2 times greater risk of developing the disease than they would been otherwise. This is the additive model whereby each SNP acts independently of all other factors – it doesn’t matter what other genetic variants a person has, or indeed what environmental factors they may be exposed to – the added effect on risk of this SNP is the same in all carriers.

That is, I think, a pretty common interpretation of what the odds ratio means in individuals, but it is certainly not the only scenario that could produce that result at the population level. In the diagram below, I illustrate several different scenarios that could all yield the same odds ratio across the population.

The additive scenario is illustrated in A. Every person who inherits the risk allele has a slightly increased risk of disease (small red arrows). [This applies whether the SNP that is genotyped in the GWAS has a functional effect itself or tags another common SNP that is the one doing the damage].

It might seem like the odds ratio can be interpreted directly as a multiplier of the baseline risk across the population, i.e., the prevalence of the disease in question. So, if the baseline rate is say 1%, then people with the “A” allele in our example above would have a risk of 1.167%, all other things being equal. The problem with that interpretation is that all other things are not equal.

For example, a condition like autism affects about 1% of the population. This does not mean, however, that everyone in the population had a 1% risk of being born autistic, and that the ones who actually are autistic were just unlucky (statistically speaking, not judgmentally). That 1% is actually made up of people who were at very high risk of being autistic – we know this because people with the same genotype as those with autism (i.e., their monozygotic twins) have a rate of autism of over 80%. What this implies is that the vast majority of the population were at effectively no risk (not at 1% risk).

This suggests that the effects of any SNP are also likely to be highly unequally distributed across the population*, depending on the genetic background, as illustrated in Scenario B. In some people, the risk variant increases risk a little bit (small red arrows), while in others it increases it a lot (bigger red arrows). In others it may have no effect (flat blue line), while in yet others it may actually decrease risk (green downward arrow).

That last situation may seem far-fetched but is actually well described; for example, two mutations that each independently cause epilepsy may paradoxically cancel each other out if they occur together. Similarly, mutations in the fragile X gene, Fmr1, or in the tuberous sclerosis gene, Tsc2, can each cause autism in humans and various neurological and behavioural symptoms when mutated in mice. However, combining them both in mice leads to a rescue of the symptoms caused by either one alone (because they counteract each other at the biochemical level).

These kinds of “epistatic” (non-additive) interactions are generally very common and can be seen for all kinds of complex traits. In terms of how they would contribute to a GWAS signal, a slight preponderance of increased risk when you average those effects across the population would generate a small odds ratio greater than 1. Based on the odds ratio alone, there is no way to distinguish scenarios A and B.

Note that this kind of effect holds for all epidemiological data – the effect sizes obtained are always averages across the population which may hide substantial variability in effect size across individuals. For example, a high-fat diet may be a much higher risk factor for cardiovascular disease in some people than in others, based on their genetic vulnerability.

It is interesting to note that if those kinds of diverse epistatic interactions occur for each SNP, then their aggregate effects will likely always look additive, as these pairwise and higher-order interactions will average out both among and across individuals. That doesn’t mean they could not in principle be decomposed to reveal such effects, as can be done using various genetic techniques in model organisms. So, just because SNP effects seem to combine additively does not rule out multiple epistatic interactions at the biological level.

Scenario C is a special case of epistatic interaction. In this case, the common risk variant has no effect on biological risk at all in most carriers (flat blue lines). However, if it occurs in people with a rare mutation in some specific gene (big purple arrow), which by itself predisposes to the disease with incomplete penetrance (where not everyone with the mutation necessarily develops the disease), then it can have a modifying effect, strongly increasing the likelihood of actual expression of the disease symptoms.

Again, this kind of scenario is well documented and is particularly well illustrated by Hirschsprung disease. This disorder, which affects innervation of the gut, can be caused by mutations in any one of about 18 known genes, one of which encodes the Ret tyrosine kinase. However, mutations in this gene are not completely penetrant – some people with it do not develop disease or have only a mild form. Recent studies have found that simultaneously carrying a common variant in the same gene increases the likelihood that carriers of the rare mutation will show severe disease. The common variant thus modifies the risk of disease substantially, but only in carriers of a rare mutation. (In this case it is in the same gene, but that doesn’t have to be the case). 

The last scenario, D, is quite different. Here, the common variant is not doing anything itself. It’s not even linked to another common variant that is doing something. Instead, it is linked to a rare mutation that causes disease with much higher penetrance. Or, to put it better, the rare mutation is linked to it. Any new mutation must arise on a background of some set of common SNPs (a “haplotype”), with which it will tend to be subsequently co-inherited. If a rare mutation that increases risk of disease rises to an appreciable frequency then it will necessarily increase the frequency of the SNPs in that haplotype in people with the disease, giving rise to what has been called a “synthetic association”.

Any one mutation might be too rare to cause such an effect (especially if it is likely to be selected against precisely because it causes disease), but if you have multiple rare mutations at a given locus, and if they happen to occur by chance more on one haplotype than another, then you could get an aggregate effect that could give a tiny difference in frequency of the sort detected by GWAS.

There are now many documented examples where GWAS signals are explained by synthetic associations with rare mutations in the sample, which have much larger odds ratios (e.g., 1, 2, 3, 4). On the other hand, there are also cases where no such rare mutations have been found (e.g., 5, 6), suggesting that such a mechanism is by no means universal. It is difficult indeed to know how prevalent that situation will turn out to be, though large-scale whole-genome sequencing studies currently underway should help address this question. (See here for theoretical discussions: 7, 8, 9, 10).

Both scenarios C and D are congruent with the repeated finding that many of the genes implicated by GWAS (with small effect sizes) are known to sometimes carry rare mutations linked to a high risk of the same disease. That would fit with a mechanism whereby common variants at a given locus increase the penetrance of rare mutations in the same gene, but have little effect otherwise (scenario C). Or it would fit with GWAS signals actually arising from synthetic association with high-penetrance rare mutations in the population (where the common variant tags these haplotypes but has no effect itself whatsoever; scenario D).

Teasing these various scenarios apart is a challenge, especially as, for any given disease, different scenarios may pertain for different SNPs. One method has been to try and find a functional effect of a common SNP at the molecular level. For example, SNPs may affect the expression of a gene, altering binding of regulatory proteins to the parts of DNA that specify how much of the protein to make, in which cells and under which conditions. Multiple such examples have been documented (sometimes with surprising results, as when the gene thus affected is actually quite distant to the SNP itself).

However, finding some effect of a common SNP on expression of a gene at a molecular level does not explain how it affects disease risk. Any of scenarios A, B or C could still pertain, and even scenario D is not ruled out by such findings. Indeed, it is not even clear what kind of molecular-level effect we should expect to explain a tiny odds ratio. Should we expect a small effect at the molecular level, or a big effect at the molecular level that translates to a small effect at the organismal level? Or a big effect at the organismal level, but only in combination with other genetic or environmental insults?

That leaves something of a Catch-22 situation for researchers looking for functional effects of SNPs at the biological level – too small an effect and it will never be detected in messy biological experiments; too big and it will have a rather glaring discrepancy with the epidemiological odds ratio. In the end, it may prove impossible to definitively investigate such small individual epidemiological effects at the biological level, whether from genetic or environmental factors.

This doesn’t mean individual GWAS signals are not useful, of course – they certainly point to loci of interest for further study and have successfully implicated previously unknown biochemical pathways in various diseases (e.g., autophagy in Crohn’s disease). It does mean, however, that the interpretation of individual SNP associations may remain a bit vague.

On the other hand, while the biological effect of any single SNP in isolation may be small, their aggregate effect should be large, at least if the model of disease being cause by a polygenic load of such common risk alleles is correct. Indeed, even if the burden of common alleles is not by itself sufficient to cause disease (e.g., in a scenario where they act collectively as a polygenic modifier of rare mutations, which I consider the most likely scenario), they may still have biological effects in aggregate on relevant traits.

There is now an ever-growing number of studies taking that approach, correlating polygenic scores of risk for various diseases (based on aggregate SNP burden) with a range of biological phenotypes. Whether this approach will really help reveal underlying pathogenic mechanisms remains to be seen. More on that in a later post.

With thanks to John McGrath for helpful comments and edits.

*The usual way around this is to model the effects of a SNP on the liability scale, rather than the observed scale of risk. This is based on the idea that underlying the observed discontinuous distribution of a disease is a normally distributed burden of liability, which effectively remains latent until some threshold of burden is passed, in which case disease results. As a mathematical model to describe risk across the population this works reasonably well, given a host of assumptions. It is a mistake, however, in my mind, to think that the model reflects pathogenic mechanisms in individuals.

Tuesday, July 28, 2015

The Genetics of Neurodevelopmental Disorders

The Genetics of Neurodevelopmental Disorders is a new book that will be published by Wiley in 2015. It is due out in August (in Europe) and September (in the USA), and is available on Amazon here

I had the pleasure of editing the book, which comprises 14 chapters from world-leading scientists and clinicians. Our aim is to provide a timely synthesis of this fast-moving field where so much exciting progress has been made in recent years. Below I have reproduced the Foreword from the book, which outlines the rationale for writing it and the conceptual principles on which it is based, as well as a summary of the topics covered (giving an overview of the state of the field in the process). There are also links to two chapters that are freely available. On behalf of all the authors, I hope the book will prove useful.

The term “neurodevelopmental disorders” is clinically defined in psychiatry as “a group of conditions with onset in the developmental period… characterized by developmental deficits that produce impairments of personal, social, academic, or occupational functioning” [DSM-5]. This term encompasses the clinical categories of intellectual disability (ID), developmental delay (DD), autism spectrum disorders (ASD), attention-deficit hyperactivity disorder (ADHD), speech and language disorders, specific learning disorders, tic disorders and others.

However, the term can be defined differently, not based on age of onset or clinical presentation, but by an etiological criterion, to mean disorders arising from aberrant neural development. This definition includes many forms of epilepsy (considered either as a distinct disorder or as a co-morbid symptom) as well as disorders like schizophrenia (SZ), which have later onset but which can still be traced back to neurodevelopmental origins. Though the symptoms of SZ itself typically arise only in late teens or early twenties, convergent evidence of epidemiological risk factors during fetal development and very early deficits apparent in longitudinal studies strongly indicate that SZ is a disorder of neural development, though its clinical consequences may remain latent for many years.

Collectively, severe neurodevelopmental disorders affect ~5% of the population (though exact numbers are almost impossible to obtain, due to changing diagnostic criteria and substantial co-morbidity between clinical categories). These disorders impact on the most fundamental aspects of human experience: cognition, language, social interaction, perception, mood, motor control, sense of self. They impair function, often severely, and restrict opportunities for sufferers, as well as placing a heavy burden on families and caregivers. As lifelong illnesses, they also give rise to a substantial economic burden, both in direct healthcare costs and indirect costs due to lost opportunity.

The treatments currently available for neurodevelopmental disorders are very limited and problematic. Intensive educational interventions may help ameliorate some cognitive or behavioural difficulties, such as those associated with ID or ASD, but to a limited extent and without addressing the underlying pathology. With respect to psychiatric symptoms, the mainstays of pharmacotherapy (antipsychotic medication, mood stabilizers, antidepressants and anxiolytics) all emerged between the 1940’s and 1960’s with almost no new drugs being developed since. Most of these treatments were discovered serendipitously, and their mechanisms of action remain poorly understood. In most cases, the existing treatments are only partially effective and can induce serious side effects. This is also true for the range of anticonvulsants, and, for all these drugs, it is typically impossible to predict from symptom profiles alone whether individual patients will benefit from a particular drug or possibly be harmed by it. These difficulties and the attendant poor outcomes for many patients arise from not knowing the causes of disease in particular patients and not understanding the underlying pathogenic mechanisms. Genetic research promises to address both these issues.

Neurodevelopmental disorders are predominantly genetic in origin and have often been thought of as falling into two groups. The first includes a very large number of individually rare syndromes with known genetic causes. Examples include Fragile X syndrome, Down syndrome, Rett syndrome and Angelman syndrome but there are literally hundreds of others. Each of these is clearly caused by a single genetic lesion, sometimes involving an entire chromosome or a section of chromosome, sometimes affecting a single gene. Most are characterised by ID, but many also show high rates of epilepsy, ASD or other neuropsychiatric symptoms.

The second group comprises idiopathic cases of ID, ASD, SZ or epilepsy – those with no currently known cause. Despite the lack of an identified genetic lesion, there is still very strong evidence of a genetic etiology across these categories. All of these conditions are highly heritable, showing high levels of twin concordance, much higher in monozygotic than in dizygotic twins, substantially increased risk to relatives and typically zero effect of a shared family environment, indicating strong genetic causation.

What has not been clear is whether these so-called “common disorders” are simply collections of rare genetic syndromes that we cannot yet discriminate, or whether they have a very different genetic architecture. The dominant paradigm in the field has held that the idiopathic, non-syndromic cases of common disorders like ASD or SZ reflect the extreme end of a continuum of risk across the population. This is based on a model involving the segregation of a very large number of genetic variants, each of small effect alone, which can, above a collective threshold of burden in individuals, result in frank disease.

Recent genetic discoveries are prompting a re-evaluation of this model, as well as casting doubt on the biological validity of clinical diagnostic categories. After decades of frustration, the genetic secrets of these conditions are finally yielding to new genomic microarray and sequencing technologies. These are revealing a growing list of rare, single mutations that confer high risk of ASD, ID, SZ or epilepsy, particularly epileptic encephalopathies.

These findings strongly reinforce a model of genetic heterogeneity, whereby common clinical categories do not represent singular biological entities, but rather are umbrella terms for a large number of distinct genetic conditions. These conditions are individually rare but collectively common. Strikingly, almost all of the identified mutations are associated with variable clinical manifestations, conferring risk across traditional diagnostic boundaries. These findings fit with large-scale epidemiological studies that also show shared risk across these disorders. Thus, while current diagnostic categories may reflect more or less distinct clinical states or outcomes, they do not reflect distinct etiologies.

The “genetics of autism” is thus neither singular nor separable from the “genetics of intellectual disability”, the “genetics of schizophrenia” or the “genetics of epilepsy”. The more general term of “developmental brain dysfunction” has been proposed to encompass disorders arising from altered neural development, which can manifest clinically in diverse ways. This book is about the genetics of developmental brain dysfunction.

A lot can go wrong in the development of a human brain. The right numbers of hundreds of distinct types of nerve cells have to be generated in the right places, they have to migrate to form highly organised structures, and they must extend nerve fibres, which navigate their way through the brain to ultimately find and connect with their appropriate partners, avoiding wrong turns and illicit interactions. Once they find their partners they must form synapses, the incredibly complex and diverse cellular structures that mediate communication between nerve cells. These synapses are also highly dynamic, responding to patterns of activity by strengthening or weakening the connection.

The instructions to carry out these processes are encoded in the genome of the developing embryo. Each of these aspects of neural development requires the concerted action of the protein products of thousands of distinct genes. Mutations in any one of them (or sometimes in several at the same time) can lead to developmental brain dysfunction.

The identification of numerous causal mutations has focused attention on the roles of the genes affected, with a number of prominent classes of neurodevelopmental genes emerging. These include genes involved in early brain patterning and proliferation, those mediating later events of cell migration and axon guidance, and a major class involved in synapse formation and subsequent activity-dependent synaptic refinement, pruning and plasticity. Also highlighted are a number of biochemical pathways and networks that appear especially sensitive to perturbation.

Genetic discoveries thus allow an alternate means to classify disorders, based on the underlying neurodevelopmental processes affected. This provides more etiologically valid and arguably more biologically coherent categories than those based on clinical outcome. For individual patients, the application of microarray and sequencing technologies is already changing clinical practice in diagnosis and management of neurodevelopmental disorders. This will only increase as more and more pathogenic mutations are identified.

Such discoveries also provide entry points to enable the elucidation of pathogenic mechanisms, where exciting progress is being made using cellular and animal models. For any given mutation, this involves defining the defects at a cellular level (in the right cells), and working out how such defects propagate to the levels of neural circuits and systems, ultimately producing pathophysiological states that underlie neuropsychiatric symptoms. Definition of these pathways will hopefully lead to a detailed enough understanding of the molecular or circuit-level defects to rationally devise new therapeutics.

The elucidation of the heterogeneous genetic and neurobiological bases of neurodevelopmental disorders should thus enable a much more personalised approach to diagnosis and treatment for individual patients, and a shift in clinical care for these disorders from an approach based on superficial symptoms and generic medicines, to one based on detailed knowledge of specific causes and mechanisms.

The book is organised into several sections:

Chapters 1-6 cover broad conceptual issues relevant to neurodevelopmental disorders in general. These are informed by recent advances in genomic technologies, which have transformed our view of the genetic architecture of both rare and so-called “common” neurodevelopmental disorders. These chapters will consider the genetic heterogeneity of clinical categories like ASD or SZ, the relative importance of different types of mutations (common vs rare; single-gene vs large deletions or duplications; inherited vs de novo), etiological overlap between clinical categories and complex interactions between two or more mutations or between genetic and environmental factors.     

A preprint of Chapter 1, by me, on The Genetic Architecture of Neurodevelopmental Disorders, is available here

Chapters 7-9 present our current understanding of several different types of disorder, grouped by the neurodevelopmental process impacted. Consideration of disorders from this angle provides a more rational and biologically valid approach than consideration from the point of view of clinical symptoms, which can be arrived at through various routes.

Chapters 10-11 deal with the elucidation of pathogenic mechanisms, following genetic discoveries. They include chapters on cellular models (using induced pluripotent stem cells derived from patients) and animal models (recapitulating pathogenic mutations in mice), which are revealing the routes of pathogenesis, from defects in diverse cellular neurodevelopmental processes to resultant alterations in neural circuits and brain systems, which ultimately impinge on behaviour. The manifestation of these defects in humans also depends on processes of learning and experience-dependent development that proceed for many years after birth. Taking this aspect of development seriously is essential as it is a critical period where symptoms can be exacerbated if neglected or potentially improved by intensive interventions. 

Chapters 13-14 consider the clinical implications of recent discoveries and of the general principles described in earlier chapters. Foremost among these is the recognition of extreme genetic heterogeneity, meaning that understanding what is going on in any particular patient requires knowledge of the specific underlying genetic cause. The dramatic reductions in cost for whole-genome sequencing mean such diagnoses will become far easier to make, with important implications for clinical genetic practice (including preimplantation or prenatal screening or diagnosis). Finally, the study of cellular and animal models of specific disorders is already suggesting potential therapeutic avenues for some conditions. These advances illustrate a general principle – to treat these conditions we need to identify and understand the underlying biology and design therapies to treat the specific cause in each patient and not just the generic symptoms.

A preprint of Chapter 13, by Gholson Lyon and Jason O'Rawe, on Human genetics and clinical aspects of neurodevelopmental disorders is available here.

The full Table of Contents is shown below:

           Kevin J. Mitchell

1.     The Genetic Architecture of Neurodevelopmental Disorders
Kevin J. Mitchell

2.     Overlapping Etiology of Neurodevelopmental Disorders
Eric Kelleher and Aiden Corvin

3.     The Mutational Spectrum of Neurodevelopmental Disorders
Nancy D. Merner, Patrick A. Dion and Guy A. Rouleau

4.     The Role of Genetic Interactions in Neurodevelopmental Disorders
Jason H. Moore and Kevin J. Mitchell

5.     Developmental Instability, Mutation Load, and Neurodevelopmental Disorders
Ronald A. Yeo and Steven W. Gangestad

6.     Environmental Factors and Gene-Environment Interactions
John McGrath

7.     The Genetics of Brain Malformations
M. Chiara Manzini and Christopher A. Walsh

8.     Disorders of Axon Guidance
Heike Blockus and Alain Chédotal

9.     Synaptic Disorders
Catalina Betancur and Kevin J. Mitchell

10.  Human Stem Cell Models of Neurodevelopmental Disorders
Peter Kirwan and Frederick J. Livesey

11.  Animal Models for Neurodevelopmental Disorders
Hala Harony-Nicolas and Joseph D. Buxbaum

12.  Cascading Genetic and Environmental Effects on Development: Implications for Intervention
Esha Massand and Annette Karmiloff-Smith

13.  Human Genetics and Clinical Aspects of Neurodevelopmental Disorders
Gholson J. Lyon and Jason O’Rawe

14.  Progress Toward Therapies and Interventions for Neurodevelopmental Disorders
Ayokunmi Ajetunmobi and Daniela Tropea

Thursday, April 30, 2015

Genetics in Modern Medicine – the Future is Now

--> Human Genome Project was founded on the premise that it would unlock the secrets of disease and lead to new cures for many disorders. While the new cures have mostly yet to materialise, the secrets of disease are indeed being revealed, in ways that will transform medicine over the coming years. Both our knowledge of the genetic causes of disease and our ability to test for those causes have increased exponentially in recent years. These advances will place genetic testing at the front line of diagnostics, not just for the relatively small number of already well-known inherited disorders, but for an ever-widening array of conditions, both rare and common.

The lifetime prevalence of rare disorders in European populations is estimated at 6-8% of the population (National Rare Disease Plan for Ireland, 2014-2018). Over 6,000 distinct genetic disorders are already defined and more are being discovered at an increasing pace. For many patients with such disorders, their experience with the health system involves a long and frustrating diagnostic odyssey. They are typically seen by various specialists for various symptoms, but the connections between them are not always recognised. A referral for genetic testing may be made eventually, but usually as a last resort rather than a first option.

In a growing proportion of such cases, genetic testing can reveal the underlying cause of the condition, bringing certainty and insight to the diagnosis. While specific medications may not exist that target each condition, a genetic diagnosis can often provide useful predictions of prognosis and treatment responsiveness. This is especially true for the hundreds of metabolic disorders, which may be treatable by dietary interventions or supplements.

But even in cases where there are no direct medical implications, just receiving a specific diagnosis can be highly beneficial in helping patients and their families cope with the situation. In addition, many international support groups have arisen relating to specific disorders, or for rare diseases in general, such as NORD (U.S.), GRDO (Ireland) and Rare Disease UK. These organisations are helping patients, parents and clinicians share information, compare experiences and improve outcomes. Genetic information can also inform future reproductive decisions, including possibilities such as pre-implantation genetic screening.

Rare mutations can cause common disorders

The effects of genetic mutations are not restricted to what we typically think of as rare disorders, however. Discoveries over the last several years are illustrating their central role in much more common disorders, such as epilepsy, autism, schizophrenia, Alzheimer’s and Parkinson’s disease, many cancers and other conditions. Indeed, many of those diagnostic categories may in fact be umbrella terms for a multiplicity of rare disorders that manifest with similar symptoms.

For neuropsychiatric conditions, it has long been known that such disorders are highly heritable, but it had not been possible to identify causal genes. That has changed, with the development of new DNA sequencing technologies, yielding insights that overturn our conception of such disorders. Rather than reflecting a single entity, broad clinical categories like autism or epilepsy obscure an extreme diversity of underlying conditions. Each of these conditions may be quite rare but there are so many of them that manifest in similar ways that collectively they result in highly prevalent disorders. Genetics now provides the tools to distinguish them.

The causal mutations in patients with these conditions can disrupt single genes or can delete or duplicate small sections of chromosomes, affecting multiple genes at once. For very severe cases, the mutations will often have arisen de novo, in the generation of egg or, more commonly, sperm cells. But others are inherited, often from parents who are clinically unaffected, despite carrying the mutation. This highlights the complexity in relating genotypes to phenotypes – the clinical presentation of such mutations is quite variable and often depends on other genetic or environmental factors. Nevertheless, in a patient showing symptoms, the identification of a major mutation can reveal important information as to the primary cause.

For example, for patients with a diagnosis of autism – a diagnosis based on symptoms alone – genetic testing for specific conditions like Fragile X syndrome or Rett syndrome has been in place for some time. This is now being expanded to include testing for a growing number of chromosomal disorders or single-gene mutations, which collectively can now explain ~15% of cases – a huge increase from just a few years ago. This percentage is growing all the time as causal mutations in new genes are identified (reaching 20-25% in recent studies). The successes for autism are likely to be duplicated for other conditions as the number of sequenced patient genomes increases.

Genome sequencing now an affordable front-line option

The pace of technological change in this field is simply staggering. We are moving from a
position of being able to test a few specific genes implicated in any particular disorder, to one where it will be cheaper and faster, as well as more informative, to sequence the patient’s entire genome. It took thousands of researchers over ten years to sequence the reference Human Genome, at a total cost of about $3,000,000,000. Today, a human genome can be sequenced for under $2,000, in about a day, maybe two.

Those sequencing costs and times are still falling as new technologies are developed and economies of scale brought to bear. This brings genome sequencing into the cost range of many blood tests, radiological scans, or other investigative procedures and suggests it may soon become a front-line test for many patients with idiopathic disease. Indeed, it may become cheaper for doctors to order a genome sequence than to spend any of their own time wondering about whether to order it.

But genomic data are only useful if someone can interpret them, a far greater task than simply checking for the presence of a mutation in a specific gene. As it happens, each of us carries a couple hundred mutations in our genome that seriously impact on gene function. Most of these do not cause disease, however, and it is therefore a challenge to recognise a pathogenic mutation amongst this background burden of mutations we all carry. That job will be made easier as genetic information becomes available for more and more patients.

A national strategy for genetic services

The enormous potential benefits of such information have been recognised in several countries, most recently in the UK where the NHS has launched a project to sequence 100,000 genomes, including those of thousands of patients with diverse disorders. The genetic heritage of each population is different, however, with some pathogenic mutations at much higher frequencies in specific populations, as with mutations causing cystic fibrosis in Ireland. Characterising the genetic heritage of the Irish population is thus an important goal as a necessary foundation for clinical genetic testing. 

The health and economic benefits of this genetic revolution will only be realised if there is adequate provision and funding of genetic testing and genetic counselling services. In Ireland we currently lag far behind most other developed countries in the provision of these services, a situation exacerbated by the recent decision to downgrade what was the National Centre for Medical Genetics at Our Lady’s Hospital in Crumlin to a department within the hospital. On the contrary, if the health service in Ireland is to keep pace with international developments and provide the best care for patients, the role of genetics services will have to be greatly expanded in the future. 

[This piece was written for "The Consultant" - the magazine of the Irish Consultants Association and appears in the Spring 2015 issue. It is reproduced here with their consent.]