Tuesday, February 7, 2012

I’ve got your missing heritability right here…

A debate is raging in human genetics these days as to why the massive genome-wide association studies (GWAS) that have been carried out for every trait and disorder imaginable over the last several years have not explained more of the underlying heritability. This is especially true for many of the so-called complex disorders that have been investigated, where results have been far less than hoped for. A good deal of effort has gone into quantifying exactly how much of the genetic variance has been “explained” and how much remains “missing”.

The problem with this question is that it limits the search space for the solution. It forces our thinking further and further along a certain path, when what we really need is to draw back and question the assumptions on which the whole approach is founded. Rather than asking what is the right answer to this question, we should be asking: what is the right question?

The idea of performing genome-wide association studies for complex disorders rests on a number of very fundamental and very big assumptions. These are explored in a recent article I wrote for Genome Biology (referenced below; reprints available on request). They are:

1) That what we call complex disorders are unitary conditions. That is, clinical categories like schizophrenia or diabetes or asthma are each a single disease and it is appropriate to investigate them by lumping together everyone in the population who has such a diagnosis – allowing us to calculate things like heritability and relative risks. Such population-based figures are only informative if all patients with these symptoms really have a common etiology.

2) That the underlying genetic architecture is polygenic – i.e., the disease arises in each individual due to toxic combinations of many genetic variants that are individually segregating at high frequency in the population (i.e., “common variants”).

3) That, despite the observed dramatic discontinuities in actual risk for the disease across the population, there is some underlying quantitative trait called “liability” that is normally distributed in the population. If a person’s load of risk variants exceeds some threshold of liability, then disease arises.

All of these assumptions typically go unquestioned – often unmentioned, in fact – yet there is no evidence that any of them is valid. In fact, the more you step back and look at them with an objective eye, the more outlandish they seem, even from first principles.

First, what reason is there to think that there is only one route to the symptoms observed in any particular complex disorder? We know there are lots of ways, genetically speaking, to cause mental retardation or blindness or deafness – why should this not also be the case for psychosis or seizures or poor blood sugar regulation? If the clinical diagnosis of a specific disorder is based on superficial criteria, as is especially the case for psychiatric disorders, then this assumption is unlikely to hold.

Second, the idea that common variants could contribute significantly to disease runs up against the effects of natural selection pretty quickly – variants that cause disease get selected against and are therefore rare. You can propose models of balancing selection (where a specific variant is beneficial in some genomic contexts and harmful in others), but there is no evidence that this mechanism is widespread. In general, the more arcane your model has to become to accommodate contradictory evidence, the more inclined you should be to question the initial premise.

Third, the idea that common disorders (where people either are or are not affected) really can be treated as quantitative traits (with a smooth distribution in the population, as with height) is really, truly bizarre. The history of this idea can be traced back to early geneticists, but it was popularised by Douglas Falconer, the godfather of quantitative genetics (he literally wrote the book).

In an attempt to demonstrate the relevance of quantitative genetics to the study of human disease, Falconer came up with a nifty solution. Even though disease states are typically all-or-nothing, and even though the actual risk of disease is clearly very discontinuously distributed in the population (dramatically higher in relatives of affecteds, for example), he claimed that it was reasonable to assume that there was something called the underlying liability to the disorder that was actually continuously distributed. This could be converted to a discontinuous distribution by further assuming that only individuals whose burden of genetic variants passed an imagined threshold actually got the disease. To transform discontinuous incidence data (mean rates of disease in various groups, such as people with different levels of genetic relatedness to affected individuals) into mean liability on a continuous scale, it was necessary to further assume that this liability was normally distributed in the population. The corollary is that liability is affected by many genetic variants, each of small effect. Q.E.D.

This model – simply declared by fiat – forms the mathematical basis for most GWAS analyses and for simulations regarding proportions of heritability explained by combinations of genetic variants (e.g., the recent paper from Eric Lander’s group). To me, it is an extraordinary claim, which you would think would require extraordinary evidence to be accepted. Despite the fact that it has no evidence to support it and fundamentally makes no biological sense (see Genome Biology article for more on that), it goes largely unquestioned and unchallenged.

In the cold light of day, the most fundamental assumptions underlying population-based approaches to investigate the genetics of “complex disorders” can be seen to be flawed, unsupported and, in my opinion, clearly invalid. More importantly, there is now lots of direct evidence that complex disorders like schizophrenia or autism or epilepsy are really umbrella terms, reflecting common symptoms associated with large numbers of distinct genetic conditions. More and more mutations causing such conditions are being identified all the time, thanks to genomic array and next generation sequencing approaches.

Different individuals and families will have very rare, sometimes even unique mutations. In some cases, it will be possible to identify specific single mutations as clearly causal; in others, it may require a combination of two or three. There is clear evidence for a very wide range of genetic etiologies leading to the same symptoms. It is time for the field to assimilate this paradigm shift and stop analysing the data in population-based terms. Rather than asking how much of the genetic variance across the population can be currently explained (a question that is nonsensical if the disorder is not a unitary condition), we should be asking about causes of disease in individuals:

- How many cases can currently be explained (by the mutations so far identified)?

- Why are the mutations not completely penetrant?

- What factors contribute to the variable phenotypic expression in different individuals carrying the same mutation?

- What are the biological functions of the genes involved and what are the consequences of their disruption?

- Why do so many different mutations give rise to the same phenotypes?

- Why are specific symptoms like psychosis or seizures or social withdrawal such common outcomes?

These are the questions that will get us to the underlying biology.

Mitchell, K. (2012). What is complex about complex disorders? Genome Biology, 13 (1) DOI: 10.1186/gb-2012-13-1-237

Manolio, T., Collins, F., Cox, N., Goldstein, D., Hindorff, L., Hunter, D., McCarthy, M., Ramos, E., Cardon, L., Chakravarti, A., Cho, J., Guttmacher, A., Kong, A., Kruglyak, L., Mardis, E., Rotimi, C., Slatkin, M., Valle, D., Whittemore, A., Boehnke, M., Clark, A., Eichler, E., Gibson, G., Haines, J., Mackay, T., McCarroll, S., & Visscher, P. (2009). Finding the missing heritability of complex diseases Nature, 461 (7265), 747-753 DOI: 10.1038/nature08494

Zuk, O., Hechter, E., Sunyaev, S., & Lander, E. (2012). The mystery of missing heritability: Genetic interactions create phantom heritability Proceedings of the National Academy of Sciences, 109 (4), 1193-1198 DOI: 10.1073/pnas.1119675109


  1. Nice. You seem to be arguing that heritability has been over-estimated. Is there are a reason to think that these (false) assumptions will necessarily lead to an inflation of heritability estimates? Or is it more the case that heritability is a meaningless concept outside of these flawed models?

    I would slightly quibble with the last point. Especially for something like autism, there's no doubt in my mind that there are subclinical traits in a high proportion of relatives. I guess you could interpret that as variable phenotypic expression / penetration of a single mutation. But then once you start getting into your question of why there is this variability - and if other genes pay a role in this, then you start moving back towards a (hybrid) model that includes common variants.

    But I 100% agree that we need to question our assumptions (always).

    1. Thanks Jon for that comment. The third point is more complex than I could go into in the blog (but see the paper for a longer treatment). What I am arguing against is the assumption that the genetics of a trait like sociability, for example, is the same as the genetics of a symptom like social withdrawal. There can be a normal distribution of a trait and deviations from that distribution that are caused by different mechanisms. And I argue that you need some major insult to get a serious phenotype - not just an accumulation of very small variants, because the system has to deal with that kind of variation all the time. For example, at the ends of the normal distribution of height you don't get dwarfism or giantism suddenly appearing - they are caused by single mutations. Same for mental retardation or intellectual disability - the genetics of these conditions are not the same as the genetics of IQ.

    2. Jon, on your first point, I don't think heritability has been overestimated - I just think it is not a very useful figure if it lumps all cases together as they may have very diverse genetic etiologies - so modeling how much of it you can explain across the population rests on the assumption that that is not true. You wouldn't use a figure like that for mental retardation as a class for example - it's just not very informative, beyond saying genetic variation is involved. Sibling risks will vary across families by mode of inheritance and penetrance. In that case we already know that mental retardation can be caused by mutations in any of a large number of different genes and so nobody would think of doing a population-based study lumping all cases together. We are learning that the same is true for autism and schizophrenia.

  2. This post relies on two fundamental strawman arguments: (1) the idea that geneticists ever expected GWAS to explain all (or even most) of the heritability of complex diseases; and (2) the idea that geneticists aren't aware that GWAS rely on simplified assumptions.

    Looking at your three "unjustified assumptions" in turn:

    1. Geneticists are perfectly aware that complex diseases are often a mixture of different diseases with different etiologies. However, in the absence of clinical or molecular markers that can be used to divide patients into more subtle categories, this lumping assumption is a necessary evil. It doesn't make GWAS impossible - it just means that the causal variants will have smaller effect sizes and thus require larger samples to detect. As better diagnostic criteria become available, you can expect genetic studies to become increasingly targeted towards specific sub-categories (indeed this is already being done in numerous diseases and traits - see "endophenotypes").

    2. GWAS doesn't require that all genetic risk derives from common variants - it merely requires that *some* risk is due to variants that are either common themselves, or are well-tagged by common variants. Whether or not such variants should actually exist in the presence of natural selection depends on a number of evolutionary and demographic parameters. Surprisingly, the results from GWAS indicate that for several diseases (especially auto-immune diseases) a substantial fraction of disease risk is actually due to common causal variants of small effect. But I'm not aware of any geneticist who's ever argued that ALL common disease risk is due to common variants. Rather, GWAS was used because common variants were the only ones accessible on a genome-wide scale; with the drop in sequencing costs, researchers are increasingly turning their attention to variants at much lower frequencies.

    3. The liability threshold model is a simplification, and is acknowledged by all geneticists as being a simplification (not just "declared by fiat") - but it's not an absurd one, especially for the many common diseases (the previous commentator mentions autism) where there is evidence for a spectrum of sub-clinical phenotypes extending from unaffected individuals all the way to frank pathology. Even in the cases where genuine discontinuity exists, I'm not sure why you regard the threshold model as so implausible: there are plenty of biological and other systems where smooth gradients in underlying parameters can result in sharp discontinuities in outcomes.

    The current shift from focusing on common variants to studying rare variants is portrayed by you and many other commentators as an overdue revolution. But no-one expected GWAS to provide the full answer - they were merely the best way to study the genetic architecture of disease given the available technology, and they have been extremely successful in teaching us about that architecture, and in revealing entirely novel disease biology. All of us expected that as sequencing technology improved the field would be able to turn its attention to other potential sources of genetic risk, and that's exactly what's happening.

  3. Daniel, thanks for your comments. I understand some people may think I am arguing against straw men but those straw men keep getting set up by people in the field, not by me. (The latest paper by Zuk et al is a good example). Re the threshold model, yes, there are many examples in biology where a threshold exists (the action potential is a good example), but none that I know of where what pushes someone over the threshold is one more allele of negligible effect alone, on top of perhaps hundreds of other such risk alleles. That is how the threshold is modeled, explicitly, in recent papers, e.g., by Visscher and colleagues.

    More fundamentally, there is simply no reason to think that liability is normally distributed in the population, when the data clearly show that actual risk is not. What I am trying to argue is that that claim should not simply be accepted as reasonable, unless someone can show some evidence that it actually holds. (And just showing that you can model it as a quantitative trait, by invoking a threshold does not constitute evidence that that model bears any relation to the actual etiology).

    For more on subclinical phenotypes, see reply to Jon's comment above.

  4. Part One.
    There are a number of assumptions held by behavioral geneticists that may not stand up to scrutiny. First, the concepts of 'genetics' and 'inheritance' are not interchangeable. Second, twin studies can determine if a condition is genetic but does not necessarily demonstrate that the condition is heritable.

    Rhett Syndrome and Down syndrome are both associated with high rates co-occurring autism and are caused by a de novo genetic mutation and in twin studies, concordance for these two genetic syndromes are almost 100% in MZ twins and almost 0 in DZ twins. The calculated heritability estimates for Rhett Syndrome and Down Syndrome far exceed the calculated heritability estimates for autism which are based on a few small European twin studies.

    Evolutionary biologists are beginning to disentangle the underlying etiology of de novo mutations in humans. Molina et al (2011) studied sperm mutations in 10 healthy male volunteer donors focusing on three mutations identified in individuals with a genetic syndrome that also have high ASD risk. The three sperm mutations (deletions and duplications) that were specifically examined for in healthy donors were: 7q11.23 (Williams syndrome), 15q11-13 (Prader-Willi syndrome), and 22q11 (Di George/velo-cardio-facial syndrome) and most genetic and epigenetic cases of Williams syndrome, Prader-Willi syndrome and 22q11 deletion syndrome are caused by de novo mutations in contrast to being inherited events. All three sperm mutations (deletions and duplications) in all three regions were found in the sperm of all the volunteer donors (Beaudet 2008).

    Several studies have been published in the field of evolutionary biology have tested the hypothesis that advancing age may be associated with increased frequency of sperm mutations in healthy volunteer donors. Bosch et al (2003) did indeed find that the frequency of chromosome 9 sperm mutations was found in all healthy donors segregated by age groups and does increase with advancing age. Sloter et al (2007) found that structural aberrations in chromosome 1 was present in the sperm of all volunteer donors and the frequency of sperm mutations significantly increased with advancing age. Klinefelter Syndrome is one of the most common mutations affecting 1 in 500-1,000 boys. The Klinefelter mutation is caused by a de novo mutation and is not inherited. Klinfelter Syndrome is associated with autism and may involve the neurexin–neuroligin genes (Bishop & Scerif 2011). Lowe et al (2001) examined the sperm of 38 fathers of Klinefelter boys and found the frequency of the XY sperm mutation is increased with advancing paternal age.

  5. Part Two.
    An important study was published a few months ago that examined the frequency of sperm mutations in workers at a benzene manufacturing plant in China. The study recruited 30 workers who had worked at the benzene manufacturing plant for more than a year and divided the workers into three groups, a low exposure group, a moderate exposure group and a high exposure group. The study included a control group of 11 unexposed workers from the same town.

    Every participant in all four groups was found to have de novo sperm mutations including 1p36 sperm mutations. The frequency of the sperm mutations was lowest, but present, in the unexposed group, higher in the low exposed group, higher still in the moderate exposed group and highest in the high exposed group. The 1p36 deletion syndrome is present in 1 in 5,000 to 10,000 newborns:

    The 1p36 deletion syndrome is also associated with co-occurring autism:

    This is the first study that has demonstrated a direct connection between a specific de novo sperm mutation (1p36 deletion), a specific severe genetic syndrome (1p36 deletion syndrome) and a specific environmental pathogen (benzene).
    The CHARGE group published a study a year ago that found that living in close proximity (<309m) to heavily congested freeways in California was associated with increased risk for autism. Benzene, because of its high octane number, is an important component in the production of refined gasoline and diesel fuels and one has to consider the possibility that at least some of these cases might be related to de novo sperm mutations associated with long-lasting high exposure to benzene particles and other air borne environmental pathogens.

    What can autism researchers learn from evolutionary biology? First, all males generate de novo sperm mutations throughout their lifetimes and second, the frequency of de novo sperm mutations increases with advancing age.

  6. Part three references


    Beaudet (2008). Allan Award Lecture: Rare Patients Leading to Epigenetics and Back to Genetics. Am J Hum Genetics May 9: 82-(5):1034-1038.

    Bishop & Scerif (2011). Klinefelter syndrome as a window on the aetiology of language and communication impairments in children: the neuroligin–neurexin hypothesis. Acta Pediatr. 2011 June 100(6):903-907.

    Bosch et al (2003). Linear increase of structural and numerical chromosome 9 abnormalities in human sperm regarding age. European Journal of Human Genetics (2003) 11, 754–759. doi:10.1038/sj.ejhg.5201049

    Lowe et al (2001). Frequency of XY Sperm Increases with Age in Fathers of Boys with Klinefelter Syndrome. American Journal of Human Genetics 69(5) Nov 2001 1046-1054.

    Marchetti F, Eskanazi B, Weldon RH et al (2011). Occupational exposure to benzene and chromosomal structural aberrations in the sperm of Chinese men. Environ Health Perspect

    Molina et al (2011). Sperm rates of 7q11.23, 15q11q13 and 22q11.2 deletions and duplications: a FISH approach. Hum Genet. 2011 Jan;129(1):35-44. Epub 2010 Oct 8.

    Sloter ED et al (2007). Frequency of human sperm carrying structural aberrations of chromosome 1 increases with advancing age. Fertil Steril. 2007 May;87(5):1077-86. Epub 2007 Apr 11.

    Volk HE, Hertz-Picciotto I et al (2010). Residential proximity to freeways and autism in the CHARGE study. Environ Health Perspect 119(6):
    Doi:10.1289/ehp.1002835 full text available at:

    Excuse the three parts. There is a character count limitation and I had to break up this comment into three sections

    1. Thanks RAJ, for those comments and the in-depth discussion of the importance of de novo mutations, which is spot on. It highlights a changing view of genetic variation in human populations - it is not the case that there is a standing pool of variants and we all inherit different combinations of them. New variants come in all the time and are much more likely to have phenotypic effects, especially deleterious ones. See for more on this.

  7. There seems to be a converging opinion among many which I concur with. Autism is a multifactorial disorder with many layers of complexity. There is no single causal mechanism that is predictive of autism. Most cases are likely to involve multiple risk factors of small effect (genetic and environmental) that in aggregate increases total risk.

    1. RAJ, what I am trying to get at is what "multifactorial" means. Most people assume the meaning you refer to above - lots of risk factors of small effect at play in each individual. There is no evidence to support that model and lots of evidence for genetic heterogeneity - different genetic etiologies in different patients, often due to a single identifiable cause. There may be modifying/exacerbating factors, and it may involve two or three mutations in some cases, but there is no reason to think it involves combinations of hundreds or thousands of variants, without some major genetic insult.

  8. For the complete opposite of my own views on the genetic architecture of schizophrenia see this recent piece by Patrick Sullivan:

  9. Kevin;
    you make the same assumption that many behavioral geneticists make. You assume that the concepts of 'genetics and 'inheritance' are interchangeble. I'll give you an example of a Gene X Gene etiology and would ask you where is the heritabilty.

    Independent mechanisms have been demonstrated in Downs Syndrome with or without co-occurring autism. Ghaziuddin compared a group of Downs Syndrome children with or without co-occurring autism. In Downs Syndrome with co-occurring autism there was an excess of first degree relatives who met the description of BAP features compared to first degree relatives in children with Downs Syndrome without co-occurring autism who did not meet the description of BAP features. None of the first degree relatives in Downs Syndrome with co-occurring autism were diagnosed either with Downs Syndrome or autism ( Ghaziuddin M ) (Ghaziuddin 1997).

    The genetic variances underlying the broader autism phenotype is independant of the gene mutation that causes Downs Syndrome.

    While autism is a strongly genetically influenced disorder, with the emphasis on 'influenced' where is the heritabilty?


    Ghaziuddin M. Autism in Down's Syndrome: family history correlates. J Intellect Disabil Res. 1997 Feb;(Pt1):87-91.

    Ghaziuddin M. Autism in Down's Syndrome: A family history. J Intellect Disabil Res. 2000 Oct;44(Pt 5):562-6.

    1. RAJ, I don't see what the issue is with these studies. Why wouldn't the presence of the BAP in relatives suggest to you the presence of genetic factors causing autism? (Independent of the occurrence of Down syndrome). That's the usual interpretation. (Whether that means a very large number of factors or a small number - could be one - with incomplete penetrance is open to debate - I would obviously argue for a small number).

  10. Kevin;
    You might be interested is a SFARI Autism article 'Effect of paternal age seen in girls with autism'

    Feel free to add to the comments:

    1. This is a really interesting finding (though it will have to be replicated). It fits with the idea that females need a stronger insult to push them into an autistic state. (Inherited mutations would be expected to be less deleterious, by virtue of the fact that the parent passing it on must have been well enough to have offspring). The general idea of females being more resilient to these mutations is also supported by evidence that the de novo CNVs seen in female autistic patients are larger (much larger on average) than those seen in males:

  11. Exeter Medical Center was developed to be the first fully specialized musculoskeletal center in the region, providing the highest standard of care in orthopaedics, Neurology, Rheumatology and related services.
    Three full – time surgeons serve the center with specialties in spinal surgery, sports medicine and joint replacement. We also cover general orthopaedic needs

  12. No man succeeds without a good woman behind him. Wife or mother, if it is both, he is twice blessed indeed.

  13. Your time is limited, so don't waste it living someone else's life. Don't be trapped by dogma - which is living with the results of other people's thinking. Don't let the noise of others' opinions drown out your own inner voice. And most important, have the courage to follow your heart and intuition.
    proofread dissertation

  14. I love to cook when I have the time. I don't cook French or Mexican food with exact recipes. I just go to the supermarket and buy things that look good, and I mix it all together and invent something. Ninety-five percent of the time, I'm lucky. Sometimes not so lucky, and I say, 'Let's go out to dinner.'
    easy recipes