Calibrating scientific skepticism – a wider look at the field of transgenerational epigenetics
I recently wrote a blogpost examining the supposed evidence for transgenerational epigenetic inheritance (TGEI) in humans. This focused specifically on a set of studies commonly cited as convincingly demonstrating the phenomenon whereby the experiences of one generation can have effects that are transmitted, through non-genetic means, to their offspring, and, more importantly, even to their grandchildren. Having examined what I considered to be the most prominent papers making these claims, I concluded that they do not in fact provide any evidence supporting that idea, as they are riddled with fatal methodological flaws.
While the scope of that piece was limited to studies in humans, I have also previously considered animal studies making similar claims, which suffer from similar methodological flaws (here and here). My overall conclusion is that there is effectively no evidence for TGEI in humans (contrary to widespread belief) and very little in mammals more generally (with one very specific exception).
Jill Escher (@JillEscher), who is an autism advocate and funder of autism research, recently posted a riposte, arguing that I was far too sweeping in my dismissal of TGEI in mammals, and listing 49 studies that, in her opinion, collectively represent very strong evidence for this phenomenon.
So, have I been unfair in my assessment of the field? Could it possibly be justified to dismiss such a large number of studies? What is the right level of skepticism to bring to bear here? For that matter, what level of skepticism of novel ideas should scientists have generally?
As on many subjects, Carl Sagan put it rather well:
Some ideas are better than others. The machinery for distinguishing them is an essential tool in dealing with the world and especially in dealing with the future. And it is precisely the mix of these two modes of thought [skeptical scrutiny and openness to new ideas] that is central to the success of science.
— Carl Sagan
In 'The Burden of Skepticism', Skeptical Inquirer (Fall 1987), 12, No. 1.
Too much openness and you accept every notion, idea, and hypothesis—which is tantamount to knowing nothing. Too much skepticism—especially rejection of new ideas before they are adequately tested—and you're not only unpleasantly grumpy, but also closed to the advance of science. A judicious mix is what we need.
— Carl Sagan
In 'Wonder and Skepticism', Skeptical Enquirer (Jan-Feb 1995), 19, No. 1.
So, in case I have come across as merely unpleasantly grumpy, let me spell out my general grounds for being skeptical of the claims of TGEI in mammals. Some of these relate to the methodology or design of specific papers but some relate to the field as a whole. (You may notice some similarities to other fields along the way.)
I am not going to go into the details of each of the 49 papers listed by Jill Escher, because I’d rather just go on living the rest of my life, to be honest. In all sincerity, I thank her for collating them all, but I will note that merely listing them does not in any way attest to their quality. Of all the papers on this topic that I have previously delved into (as detailed at length in the linked blog posts), none has approached what I would consider a convincing level of evidence.
Nor do I consider the fact that there are 49 of them as necessarily increasing the general truthiness of the idea of TGEI. I could cite over 4900 papers showing candidate gene associations for hundreds of human traits and disorders and they’d still all be suspect due to the shared deficiencies of this methodology.
I will, however, pick out some examples to illustrate the following characteristics of the field:
1. It is plagued by poor statistical methodology and questionable research practices.
In my piece on human studies and in the previous ones on animal studies I wrote about one major type of statistical malpractice that characterises papers in this field, which is using the analysis of covariates to dredge for statistical significance.
Sex is the most common one. If your main analysis doesn’t show an effect, don’t worry – you can split the data to see if it only shows up in male or female offspring or is only passed through the male or female germline. These are often combined, so you get truly arcane scenarios where an effect is supposedly first passed only through, say, the female germline, but then only through the male germline in the next generation, and then only affects female offspring in the third generation. These are frankly absurd, on their face, but nevertheless are commonplace in this literature.
In human epidemiology, trimester of exposure is another commonly exploited covariate, giving three more bites of the cherry, with the added advantage that any apparently selective effects can be presented as evidence of a critical period. (Could be true, in some cases, but could just as well be noise).
This speaks to a more general problem, which is the lack of a clearly defined hypothesis prior to data collection. Generally speaking, any effect will do. In animals, this may mean testing multiple behaviours and accepting either an increase or a decrease in any one of them as an interesting finding. Do the offspring show increased anxiety? Great – we can come up with a story for why that would be. Decreased anxiety? Great – there’s an alternate narrative that can be constructed, no problem. Memory defect, motor problems, increased locomotion, decreased locomotion, more marbles buried, less social interaction? All cool, we can work with any of that. If you run enough tests and take any difference as interesting, you massively increase your chances of finding something that is statistically significant (when considered alone).
That brings me to two additional methodological issues related to these kinds of analyses:
First, incorrect analysis of a difference in the difference. This is a widespread problem in many areas, but seems particularly common to me in the TGEI literature, due to the typical experimental design employed.
In these experiments (in animals), one typically creates a test group, descended from animals exposed to the inducing factors to be tested, and a control group, descended from unexposed animals. These are then commonly compared in various experimental tests, which often themselves involve a comparison between two parameters or conditions. For example, to test recognition memory, the time spent investigating a novel versus a familiar object or animal is compared.
This is usually analysed separately for the test and the control animals to see if there is a significant difference between conditions, for the test animals but not for the control animals. (As, for example, in this paper by Bohacek et al., cited by Escher):
This is interpreted as the treatment having an effect on recognition memory. It is based on an inference that the difference between the two groups is significant, but that was not actually tested. The correct way to analyse these kinds of data is to combine them all and test for an interaction between group and condition. (In this particular instance, my bet would be that this would not show a significant difference, given that the direction and magnitude of effect look pretty comparable across the groups).
This may seem like a minor or esoteric point to make, but if you look through these papers you will find it cropping up again and again, as this kind of analysis is the mainstay of the basic experimental design of the field.
Second, lack of correction for multiple testing. Again, this problem is widespread in many areas of science, though some (like human genomics) have recognised and corrected it. Mouse behaviour is appalling as a field, in this regard, but epigenomics gives it a run for its money.
Simply put, if you test enough variables for a difference between two groups, and set p less than 0.05 as your threshold of statistical significance, then five percent of your tested variables would be expected to hit that supposedly stringent threshold by chance.
So, if you test lots of behaviours in an exploratory way, you should correct your p-value threshold accordingly. This is almost never done in this literature (or in the wider mouse behavioural literature, to be fair).
Correction for multiple tests is also rarely done in epigenomics analyses, such as looking at the methylation of multiple CpG sites across a gene (or sometimes across multiple genes). Again, not to pick on it, but just because it was one of the first papers in Escher’s list that I read, this figure from Bohacek et al illustrates this widespread problem:
The difference in CpG 6 is taken as significant here even though the p-value (0.038 in this case) would not survive the appropriate correction for multiple testing. (I have previously described the same statistical problem in other papers doing this kind of analysis, not to mention the widespread problems with the application and interpretation of the experimental methods used to generate the data).
So, one of the reasons I am skeptical of studies in this field is that the collective standards for statistical practices are regrettably low. This means that many of these papers (in fact all of the ones I’ve looked at it in detail) that claim to prove the existence of TGEI as a phenomenon in mammals in fact do no such thing.
This brings me to a few more general points about the field.
2. It is not progressive.
In most areas of biology, there is a natural progression from discovery of some novel phenomenon to gradual, but steady, elucidation of the underlying mechanisms. RNA interference springs to mind as a good example – very quickly after initial reports of this truly surprising phenomenon, researchers had made progress in figuring out how it works at a detailed molecular level. They didn’t just keep publishing papers demonstrating the existence of the phenomenon over and over again, but never explaining how it could occur.
The TGEI field has not, in my judgment at least, been like that. It’s been at least 15 years since these kinds of ideas became prominent, yet almost all of the papers in the 49 listed by Escher are simply claiming to show (again) that the phenomenon exists. In that regard, it reminds me of the psychological literature on social priming – there are scores of papers purportedly demonstrating the existence of the phenomenon and none explaining how it works. (Spoiler: it doesn’t).
This comparison highlights another, more insidious problem in the TGEI field – publication bias. When the papers in a field are merely showing an effect, then studies that do indeed find some difference that “achieves statistical significance” are vastly more likely to be published than ones that do not. It might be compelling if 49 groups had independently stumbled upon evidence of a phenomenon by accident. But that is not what happened. The researchers did not just serendipitously observe the phenomenon. They were not even asking the question of whether or not the phenomenon exists in a disinterested fashion. They were motivated to find evidence for it and to selectively publish those studies that showed it. This kind of publication bias massively inflates the truthiness of the phenomenon. Escher lists 49 supposedly positive studies but we will never know how many negative ones never saw the light of day.
[I don’t, by the way, mean to impugn the motives of any of the researchers involved – these are the incentives that the modern scientific enterprise has put in place and that affect us all.]
By contrast, studies that are working out details of mechanism are far less prone to this kind of bias because no one has a stake in what the answer turns out to be. I would have expected by now that one or other of the systems in which this phenomenon has been reported would have proven robust enough to allow the experimental investigation of the underlying mechanisms. The only real example where the underlying mechanism has been elucidated is at the agouti locus in mice, and it is an exceptional case as it involves heterochromatic silencing of a transposable element.
Now, perhaps I am being unfair – maybe 15 years is not that long, especially if you’re doing mouse work. Perhaps there is real progress being made on the mechanisms and I just haven’t seen it. I’ll be delighted to be convinced if someone develops a robust experimental system where the phenomenon can be reliably elicited and then works out the details of how that happens – perhaps there will be some amazing and important new biology revealed.
But there is a deeper problem, especially for those systems where people claim that some behavioural experience in one generation is somehow transmitted by molecular marks on the DNA across generations to affect the behaviour of the grandchildren. The problem is not just that the mechanism has not been worked out – it’s that it’s hard to even imagine what such a mechanism might be.
3. It lacks even a plausible mechanism in most cases
This problem is illustrated by a prominent paper in Escher’s list that claimed that olfactory experiences (association of an odor with a shock) in one generation of animals could lead to an alteration in sensitivity to that odor in the grandchildren of the animals exposed. The paper suffers from some of the statistical issues described above but actually if you just took all the data at face value you might well come away convinced that this is a real thing.
But that’s when you need to calibrate your priors. As I previously tweeted, in order for this phenomenon to exist, this is what would have to happen:
You can say the same for similar proposed scenarios in humans, involving specific types of experiences and specific behavioural outcomes in descendants.
[I should note that the effects of direct exposure to chemicals or toxins of various kinds differ somewhat in this regard in that at least the initial exposure could plausibly directly affect the molecular landscape in the germline. However, one still has to answer why and how there would be a particular pattern of epigenomic modifications induced, how such modifications would persist through multiple rounds of epigenomic “rebooting”, as well as through the differentiation of the embryos so as to affect specific mature cell types, and why they would manifest with the particular effects claimed, again often on specific behaviours.]
Many of the phenomena described in the TGEI literature are thus actually quite extraordinary. And extraordinary claims require extraordinary evidence. As discussed above, the evidence presented usually does not even rise to “ordinary”, but even if it did I think it is still worthwhile assessing the findings in another way.
When I see a paper making an extraordinary claim I think it is appropriate to judge it not just on the statistical evidence presented within it, which refer only to the parameters of that particular experiment, but on the more general relative likelihood set up here:
Which is more likely? That the researchers have happened on some truly novel and extraordinary biology or that something funny happened somewhere?
A few years ago, when an international team of physicists published evidence that neutrinos can travel 0.002% faster than light, even the authors of the paper didn’t believe it, though all their measurements and calculations had been exhaustively checked and rechecked and the evidence from the experiment itself seemed pretty conclusive. Turns out a fiber-optic cable in some important machinery was loose, leading to a very small error in timing.
Their skepticism was thus entirely justified, based on their wider knowledge, not just the isolated evidence from the particular experiment. This gets back to Sagan’s point about balancing wonder with skepticism – scientific training and knowledge should inform our judgment of novel findings, not just the p-values within a given paper.
Now, I’m not saying that models of TGEI are as heretical as faster-than-light travel, but they’re certainly unexpected enough to make us calibrate our prior expectations downwards. Indeed, the one described above on olfactory experiences would require not just one but a multiplicity of novel mechanisms to achieve.
But, hey, weird shit happens – who am I to rule it out? It may seem the height of arrogance or even dogmatic intransigence to do so. But there is one other general reason to be skeptical of the described phenomena and the supposed mechanisms that would have to exist to mediate them: there’s no reason for them. They don’t solve any problem – neither one facing scientists nor one facing organisms.
4. It doesn’t solve any problem
One of the problems facing scientists, to which researchers often suggest TGEI might be part of the answer, is known as the case of the “missing heritability”. I know, it sounds like Scooby Doo… “What do you know, it was Old Man Epigenetics all the time!” Well, it wasn’t.
The missing heritability problem refers to the fact that many human traits and disorders have been shown to highly heritable, but it has proven difficult to find the genetic variants that account for all of this heritability. Heritability is a highly technical term and should not be confused with the more colloquial term heredity.
When we say a trait is, say, 60% heritable, what that means is that 60% of the variance in that trait across the population (the degree of spread of the typically bell-shaped curve of individual values) is attributable to genetic variation. This can be estimated in various ways, including twin or family studies, or, more recently, by studies across thousands of only very, very distantly related people.
The important thing about these designs is that they explicitly dissociate effects due to genetics from effects due to environmental factors. In twin studies, for example, it is the excess similarity of monozygotic twins over that of dizygotic twins that allows us to attribute some proportion of the MZ twins’ similarity to shared genetics (and, inversely, some proportion of the phenotypic variance across the population to genetic differences).
Since any TGEI effects due to environmental factors should affect all offspring equally, they would, based on these experimental designs, be explicitly excluded from the heritability term. They therefore cannot, by definition, help explain the missing heritability.
Another idea is that TGEI provides a mechanism of passing on knowledge gained through behavioural experiences – such as an individual learning that an odor is associated with a shock. You might imagine a more natural scenario where an animal learns that a certain foodstuff make it sick. Maybe it would be good to have some kind of mechanism to pass that knowledge on in animals that don’t have language and culture. But there just isn’t any evidence for the existence of such a phenomenon from the ecological and behavioural literature. There is no mystery in the field that TGEI is solving.
The same can be said for the idea that TGEI is a mechanism of extending developmental plasticity across generations until evolution through genetic changes can catch up with the need for adaptation to a new environment. Developmental plasticity is itself a genetically encoded phenotype. It doesn’t need any help from epigenetics.
Finally, the notion that epigenetics (transgenerational or not) may be a mediator of environmental causes of conditions like autism also has no real support. In fact, the notion that there are important environmental causes of autism has no real support. The heritability of autism is over 80%. It is an overwhelmingly genetic condition. People with autism were, on average, genetically at enormously high risk of developing autism (based on the average concordance between MZ twins). Twin and family studies strongly suggest that the shared family environment makes zero contribution to risk. So, even the fact that 20% of variance in risk that is not attributable to genetics does not necessarily indicate the existence of environmental risk factors. (Indeed, much of the remaining variance may be due to intrinsic noise in the processes of brain development).
Ultimately, there is nothing where we can say: “We know that X happens, but we don’t know how. Maybe TGEI is a mechanism that can mediate X.” Instead, the introduction to these papers usually reads like this: “We know that TGEI can happen in X. [Narrator: we don’t know that]. Maybe it also happens in Y”.
So, until someone can show me a scenario where TGEI solves a known problem, has at least a conceivable, biologically plausible mechanism, is robust enough to provide an experimental system to work out the actual mechanism, and has convincing enough evidence of existing as a phenomenon in the first place, I will keep my skepticometer dialled to 11.