Welcome to your genome

There is a common view that the human genome has two different parts – a “constant” part and a “variable” part. According to this view, the bases of DNA in the constant part are the same across all individuals. They are said to be “fixed” in the population. They are what make us all human – they differentiate us from other species. The variable part, in contrast, is made of positions in the DNA sequence that are “polymorphic” – they come in two or more different versions. Some people carry one base at that position and others carry another. The idea is that it is the particular set of such variations that we inherit that makes us each unique (unless we have an identical twin). According to this idea, we each have a hand dealt from the same deck.

The genome sequence (a simple linear code made up of 3 billion bases of DNA in precise order, chopped up onto different chromosomes) is peppered with these polymorphic positions – about 1 in every 1,250 bases. That makes about 2,400,000 polymorphisms in each genome (and we each carry two copies of the genome). That certainly seems like plenty of raw material, with limitless combinations that could explain the richness of human diversity. This interpretation has fuelled massive scientific projects to try and find which common polymorphisms affect which traits. (Not to mention personal genomics companies who will try to tell you your risk of various diseases based on your profile of such polymorphisms).

The problem with this view is that it is wrong. Or at least woefully incomplete.

The reason is it ignores another source of variation: very rare mutations in those bases that are constant across the vast majority of individuals. There is now very good evidence that it is those kinds of mutations that contribute most to our individuality. Certainly, they are much more likely to affect a protein’s function and much more likely to contribute to genetic disease. We each carry hundreds of such rare mutations that can affect protein function or expression and are much more likely to have a phenotypic impact than common polymorphisms.

Indeed, far from most of the genome being effectively constant, it can be estimated that every position in the genome has been mutated many, many times over in the human population. And each of us carries hundreds of new mutations that arose during generation of the sperm and egg cells that fused to form us. New mutations may spread in the pedigree or population in which they arise for some time, depending in part on whether they have a deleterious effect or not. Ones that do will likely be quickly selected against.

A new paper from the 1000 genomes project consortium shows that:

“the vast majority of human variable sites are rare and that the majority of rare variants exhibit, at most, very little sharing among continental populations”.

This is a much more fluid picture of genetic variation than we are used to. We are not all dealt a genetic hand from the same deck – each population, sub-population, kindred, nuclear family has a distinct set of rare genetic variants. And each of these decks contains a lot of jokers – the new mutations that arise each time a hand is dealt.

Why have such rare mutations generally been ignored while the polymorphic sites have been the focus of intense research? There are several reasons, some practical and some theoretical. Practically, it has until recently been almost impossible to systematically find very rare mutations. To do so requires that we sequence the whole genome, which has only recently become feasible. In contrast, methods to survey which bases you carry at all the polymorphic sites across the genome were developed quite some time ago now and are relatively cheap to use. (They rely on sampling about 500,000 such sites around the genome – because of unevenness in the way different bits of chromosomes get swapped when sperm and eggs are made, this sample actually tells you about most of the variable sites across the whole genome). So, there has been a tendency to argue that polymorphic sites will be major contributors to human phenotypes (especially diseases) because those have been the only ones we have been able to look at.

Unfortunately, the results of genome-wide association studies, which aim to identify common variants associated with traits or diseases, have been disappointing. This is especially true for disorders with large effects on fitness, such as schizophrenia or autism. Some variants have been found but their effects, even in combination are very small. Most of the heritability of most of the traits or diseases examined to date remains unexplained. (There are some important exceptions, especially for diseases that strike only late in life and for things like drug responses, where selective pressures to weed out deleterious alleles are not at play).

In contrast, many more rare mutations causing disease are being discovered all the time, and the pace of such discoveries is likely to increase with technological advances. The main message that emerges from these studies has been called by Mary-Claire King the “Anna Karenina principle”, based on Tolstoy’s famous opening line:

“Happy families are all alike; every unhappy family is unhappy in its own way”

But can such rare variants really explain the “missing heritability” of these disorders? Some people have argued that they cannot, but this seems to me to be based on a pervasive misconception of how the heritability of a trait is measured and what it means. According to this misconception, if a trait is heritable across the population, that heritability cannot be accounted for by rare variants. After all, if a mutation only occurs in one or a few individuals, it could only minimally (nearly negligibly) contribute to heritability across the whole population. That is true. However, heritability is not measured across the population – it is measured in families and then averaged across the population.

In humans, it is usually derived by comparing phenotypes between people of different genetic relatedness (identical versus fraternal twins, siblings, parents, cousins, etc.). The values of these comparisons are then averaged across large numbers of pairs to allow estimates of how much genetic variance affects phenotypic variance – the population heritability. While a specific rare mutation may only affect the phenotype within a single family, such mutations could, collectively, explain all of the heritability. Completely different sets of mutations could be affecting the trait or causing the disease in different families.

The next few years will reveal the true impact of rare mutations. We should certainly expect complex genetic interactions and some real effects of common polymorphisms. But the idea that our traits are determined simply by the combination of variants we inherit from a static pool in the population is no longer tenable. We are each far more unique than that.

(And if your personal genomics company isn’t offering to sequence your whole genome, it’s not personal enough).

Gravel S, Henn BM, Gutenkunst RN, Indap AR, Marth GT, Clark AG, Yu F, Gibbs RA, The 1000 Genomes Project, & Bustamante CD (2011). Demographic history and rare allele sharing among human populations. Proceedings of the National Academy of Sciences of the United States of America, 108 (29), 11983-11988 PMID: 21730125

Walsh CA, & Engle EC (2010). Allelic diversity in human developmental neurogenetics: insights into biology and disease. Neuron, 68 (2), 245-53 PMID: 20955932

McClellan, J., & King, M. (2010). Genetic Heterogeneity in Human Disease Cell, 141 (2), 210-217 DOI: 10.1016/j.cell.2010.03.032


  1. It isn't just base polymorphisms involved here either, but also longer insertion events. From the same issue of Cell:

    Cheng Ran Lisa Huang, Anna M. Schneider, Yunqi Lu, Tejasvi Niranjan, Peilin Shen, Matoya A. Robinson, Jared P. Steranka, David Valle, Curt I. Civin, Tao Wang, Sarah J. Wheelan, Hongkai Ji, Jef D. Boeke, Kathleen H. Burns Mobile Interspersed Repeats Are Major Structural Variants in the Human Genome Cell, Volume 141, Issue 7, 25 June 2010, Pages 1171-1182 doi:10.1016/j.cell.2010.05.026

    These insertions can potentially include copies of whole genes, or perhaps the coding regions, inserted where they'll be expressed under completely different circumstances.

  2. AK, you are absolutely right. I ignored this important type of variation, involving mobile elements, as well as copy number variants - deletions or duplications of chunks of chromosomes. Though far fewer in number than single-nucleotide polymorphisms, they actually affect more of the genome. Because individual copy number variants are rare, but recurrent (they have a tendency to arise at the same positions), they have rather different demographics from the single base polymorphisms.

  3. Update: highly relevant and insightful discussion on genetic screening by David Goldstein in today's Nature: http://www.nature.com/nature/journal/v476/n7358/full/476027a.html

  4. You are ignoring the epigenome which controls gene suppression, and transcription.

    Most genes are duplicated and require one copy to be suppressed. The suppression is partially inherited and passed on to offspring, but is also random (or seems so). Identical twins diverge over life as their epigenome diverges.

    Diet and other environmental factors have been shown to affect the epigenome inherited by individuals with identical DNA.

    Sequencing of DNA has been like discovering the atom, electrons, protons, and neutrons - the underlying complexity is even greater with quarks and spin and charm becoming the real focus of understanding.

  5. Epigenetic modification or packaging of DNA is important as a means of regulation of gene expression. Whether it is widely important as a mechanism of heredity is far less certain, though there are a couple examples where such modifications can be inherited. What is certain, however, is that inherited genetic differences are only the start of what makes us all unique. See these posts for more:



  6. If a rare variant is found is it logical to expect that other rare variants may be found in the same protein encoding sequence in other families? So you could first sequence these areas in other affected families and avoid whole genome sequencing to identify all rare varients?

  7. Owen, yes, you're right - one could certainly expect more mutations to show up in the same gene(s), if you already know that when they are mutated that that can cause some disease. If you have a panel of such genes then you may be able to capture just that DNA and sequence it. At present this is cheaper than sequencing the whole genome but that may not be the case for long actually.

  8. Thanks for your insight for this method great story; this is the kind of feature that continues me though out the day.I’ve long been seeking around for your webpage following I learned about them from a companion and was pleased when I was able to come across it just after browsing for a while. Being a devoted blogger, I’m happy to determine other people taking effort and surrounding to the neighborhood. I just wanted to review to exhibit my thanks for one’s submit as it is quite encouraging, and many writers don’t get the credit they ought to have. I’m positive I’ll be back again and can deliver a number of my mates.


  9. There is definitely so much that you learn from your DNA. The research that you are doing is so good on this. So much info that you can get from it. The options are endless with this research.

    Toronto Skunk Removal

  10. Very nice blog. This is the site that I've been looking for.I really love reading on it gaining insights on how our brain functions.

    cebu dance party
    cebu life dance

  11. "heritability is not measured across the population – it is measured in families and then averaged across the population."
    Good sentence. As someone who understands the basic assumptions of heritability, this really clarifies your argument


Post a Comment

Popular posts from this blog

Undetermined - a response to Robert Sapolsky. Part 1 - a tale of two neuroscientists

Grandma’s trauma – a critical appraisal of the evidence for transgenerational epigenetic inheritance in humans

Undetermined - a response to Robert Sapolsky. Part 2 - assessing the scientific evidence