Calibrating scientific skepticism – a wider look at the field of transgenerational epigenetics
I recently wrote a blogpost examining the
supposed evidence for transgenerational epigenetic inheritance (TGEI) in
humans. This focused specifically on a set of studies commonly cited as
convincingly demonstrating the phenomenon whereby the experiences of one
generation can have effects that are transmitted, through non-genetic means, to
their offspring, and, more importantly, even to their grandchildren. Having
examined what I considered to be the most prominent papers making these claims,
I concluded that they do not in fact provide any evidence supporting that idea,
as they are riddled with fatal methodological flaws.
While the scope of that piece was limited
to studies in humans, I have also previously considered animal studies making
similar claims, which suffer from similar methodological flaws (here and here).
My overall conclusion is that there is effectively no evidence for TGEI in
humans (contrary to widespread belief) and very little in mammals more
generally (with one very specific exception).
Jill Escher (@JillEscher), who is an autism advocate and
funder of autism research, recently posted a riposte, arguing that I was far
too sweeping in my dismissal of TGEI in mammals, and listing 49
studies that, in her opinion, collectively represent very strong evidence for
this phenomenon.
So, have I been unfair in my assessment of
the field? Could it possibly be justified to dismiss such a large number of
studies? What is the right level of skepticism to bring to bear here? For that
matter, what level of skepticism of novel ideas should scientists have
generally?
As on many subjects, Carl Sagan put it
rather well:
Some ideas
are better than others. The machinery for distinguishing them is an essential
tool in dealing with the world and especially in dealing with the future. And
it is precisely the mix of these two modes of thought [skeptical scrutiny and
openness to new ideas] that is central to the success of science.
— Carl Sagan
In 'The
Burden of Skepticism', Skeptical Inquirer (Fall 1987), 12, No. 1.
Too much
openness and you accept every notion, idea, and hypothesis—which is tantamount
to knowing nothing. Too much skepticism—especially rejection of new ideas
before they are adequately tested—and you're not only unpleasantly grumpy, but
also closed to the advance of science. A judicious mix is what we need.
— Carl Sagan
In 'Wonder
and Skepticism', Skeptical Enquirer (Jan-Feb 1995), 19, No. 1.
So, in case I have come across as merely
unpleasantly grumpy, let me spell out my general grounds for being skeptical of
the claims of TGEI in mammals. Some of these relate to the methodology or
design of specific papers but some relate to the field as a whole. (You may
notice some similarities to other fields along the way.)
I am not going to go into the details of
each of the 49 papers listed by Jill Escher, because I’d rather just go on
living the rest of my life, to be honest. In all sincerity, I thank her for
collating them all, but I will note that merely listing them does not in any
way attest to their quality. Of all the papers on this topic that I have
previously delved into (as detailed at length in the linked blog posts), none
has approached what I would consider a convincing level of evidence.
Nor do I consider the fact that there are
49 of them as necessarily increasing the general truthiness of the idea of TGEI.
I could cite over 4900 papers showing candidate gene associations for hundreds
of human traits and disorders and they’d still all be suspect due to the shared deficiencies of this methodology.
I will, however, pick out some examples to
illustrate the following characteristics of the field:
1. It
is plagued by poor statistical methodology and questionable research practices.
In my piece on human studies and in the
previous ones on animal studies I wrote about one major type of statistical
malpractice that characterises papers in this field, which is using the
analysis of covariates to dredge for statistical significance.
Sex is the most common one. If your main
analysis doesn’t show an effect, don’t worry – you can split the data to see if
it only shows up in male or female offspring or is only passed through the male
or female germline. These are often combined, so you get truly arcane scenarios
where an effect is supposedly first passed only through, say, the female
germline, but then only through the male germline in the next generation, and
then only affects female offspring in the third generation. These are frankly
absurd, on their face, but nevertheless are commonplace in this literature.
In human epidemiology, trimester of
exposure is another commonly exploited covariate, giving three more bites of
the cherry, with the added advantage that any apparently selective effects can
be presented as evidence of a critical period. (Could be true, in some cases,
but could just as well be noise).
This speaks to a more general problem,
which is the lack of a clearly defined hypothesis prior to data collection.
Generally speaking, any effect will do. In animals, this may mean testing
multiple behaviours and accepting either an increase or a decrease in any one
of them as an interesting finding. Do the offspring show increased anxiety?
Great – we can come up with a story for why that would be. Decreased anxiety?
Great – there’s an alternate narrative that can be constructed, no problem. Memory
defect, motor problems, increased locomotion, decreased locomotion, more
marbles buried, less social interaction? All cool, we can work with any of
that. If you run enough tests and take any difference as interesting, you
massively increase your chances of finding something that is statistically
significant (when considered alone).
That brings me to two additional methodological
issues related to these kinds of analyses:
First, incorrect analysis of a difference in the
difference. This is a widespread problem in
many areas, but seems particularly common to me in the TGEI literature, due to
the typical experimental design employed.
In these experiments (in animals), one
typically creates a test group, descended from animals exposed to the inducing
factors to be tested, and a control group, descended from unexposed animals.
These are then commonly compared in various experimental tests, which often
themselves involve a comparison between two parameters or conditions. For
example, to test recognition memory, the time spent investigating a novel
versus a familiar object or animal is compared.
This is usually analysed separately for the
test and the control animals to see if there is a significant difference
between conditions, for the test animals but not for the control
animals. (As, for example, in this paper by Bohacek et al., cited by Escher):
This is interpreted as the treatment having
an effect on recognition memory. It is based on an inference that the
difference between the two groups is significant, but that was not actually
tested. The correct way to analyse these kinds of data is to combine them all
and test for an interaction between group and condition. (In this particular instance, my bet
would be that this would not show a significant difference, given that the
direction and magnitude of effect look pretty comparable across the groups).
This may seem like a minor or esoteric point to
make, but if you look through these papers you will find it cropping up again
and again, as this kind of analysis is the mainstay of the basic experimental
design of the field.
Second, lack of correction for multiple testing. Again, this problem is widespread in many
areas of science, though some (like human genomics) have recognised and
corrected it. Mouse behaviour is appalling as a field, in this regard, but
epigenomics gives it a run for its money.
Simply put, if you test enough variables for a difference between two groups, and set p less than 0.05 as your threshold of statistical significance, then five percent of your tested variables would be expected to hit that supposedly stringent threshold by chance.
So, if you test lots of behaviours in an
exploratory way, you should correct your p-value threshold accordingly. This is
almost never done in this literature (or in the wider mouse behavioural
literature, to be fair).
Correction for multiple tests is also rarely done in epigenomics analyses, such as looking at the methylation
of multiple CpG sites across a gene (or sometimes across multiple genes).
Again, not to pick on it, but just because it was one of the first papers in
Escher’s list that I read, this figure from Bohacek et al illustrates this
widespread problem:
The difference in CpG 6 is taken as
significant here even though the p-value (0.038 in this case) would not
survive the appropriate correction for multiple testing. (I have previously
described the same statistical problem in other papers doing this kind of
analysis, not to mention the widespread problems with the application and
interpretation of the experimental methods used to generate the data).
So, one of the reasons I am skeptical of
studies in this field is that the collective standards for statistical
practices are regrettably low. This means that many of these papers (in fact
all of the ones I’ve looked at it in detail) that claim to prove
the existence of TGEI as a phenomenon in mammals in fact do no such thing.
This brings me to a few more general points
about the field.
2. It
is not progressive.
In most areas of biology, there is a
natural progression from discovery of some novel phenomenon to gradual, but
steady, elucidation of the underlying mechanisms. RNA interference springs to
mind as a good example – very quickly after initial reports of this truly
surprising phenomenon, researchers had made progress in figuring out how it
works at a detailed molecular level. They didn’t just keep publishing papers
demonstrating the existence of the phenomenon over and over again, but never
explaining how it could occur.
The TGEI field has not, in my judgment at
least, been like that. It’s been at least 15 years since these kinds of ideas
became prominent, yet almost all of the papers in the 49 listed by Escher are
simply claiming to show (again) that the phenomenon exists. In that regard, it
reminds me of the psychological literature on social priming – there are scores
of papers purportedly demonstrating the existence of the phenomenon and none
explaining how it works. (Spoiler: it doesn’t).
This comparison highlights another, more
insidious problem in the TGEI field – publication bias. When the papers in a
field are merely showing an effect, then studies that do indeed find some
difference that “achieves statistical significance” are vastly more likely to
be published than ones that do not. It might be compelling if 49 groups had
independently stumbled upon evidence of a phenomenon by accident. But that is
not what happened. The researchers did not just serendipitously observe the
phenomenon. They were not even asking the question of whether or not the
phenomenon exists in a disinterested fashion. They were motivated to find
evidence for it and to selectively publish those studies that showed it. This
kind of publication bias massively inflates the truthiness of the phenomenon.
Escher lists 49 supposedly positive studies but we will never know how many
negative ones never saw the light of day.
[I don’t, by the way, mean to impugn the
motives of any of the researchers involved – these are the incentives that the
modern scientific enterprise has put in place and that affect us all.]
By contrast, studies that are working out
details of mechanism are far less prone to this kind of bias because no one has
a stake in what the answer turns out to be. I would have expected by now that
one or other of the systems in which this phenomenon has been reported would
have proven robust enough to allow the experimental investigation of the
underlying mechanisms. The only real example where the underlying mechanism has
been elucidated is at the agouti locus in mice, and it is an exceptional case
as it involves heterochromatic silencing of a transposable element.
Now, perhaps I am being unfair – maybe 15
years is not that long, especially if you’re doing mouse work. Perhaps there is
real progress being made on the mechanisms and I just haven’t seen it. I’ll be
delighted to be convinced if someone develops a robust experimental system
where the phenomenon can be reliably elicited and then works out the details of
how that happens – perhaps there will be some amazing and important new biology
revealed.
But there is a deeper problem, especially
for those systems where people claim that some behavioural experience in one
generation is somehow transmitted by molecular marks on the DNA across
generations to affect the behaviour of the grandchildren. The problem is not
just that the mechanism has not been worked out – it’s that it’s hard to even
imagine what such a mechanism might be.
3. It
lacks even a plausible mechanism in most cases
This problem is illustrated by a prominent paper in Escher’s list that claimed that olfactory experiences (association of
an odor with a shock) in one generation of animals could lead to an alteration
in sensitivity to that odor in the grandchildren of the animals exposed. The
paper suffers from some of the statistical issues described above but actually
if you just took all the data at face value you might well come away convinced
that this is a real thing.
But that’s when you need to calibrate your
priors. As I previously tweeted, in order for this phenomenon to exist, this is
what would have to happen:
You can say the same for similar proposed
scenarios in humans, involving specific types of experiences and specific
behavioural outcomes in descendants.
[I should note that the effects of direct
exposure to chemicals or toxins of various kinds differ somewhat in this regard
in that at least the initial exposure could plausibly directly affect the
molecular landscape in the germline. However, one still has to answer why and how
there would be a particular pattern of epigenomic modifications induced, how such
modifications would persist through multiple rounds of epigenomic “rebooting”,
as well as through the differentiation of the embryos so as to affect specific
mature cell types, and why they would manifest with the particular effects
claimed, again often on specific behaviours.]
Many of the phenomena described in the TGEI
literature are thus actually quite extraordinary. And extraordinary claims
require extraordinary evidence. As discussed above, the evidence presented
usually does not even rise to “ordinary”, but even if it did I think it is
still worthwhile assessing the findings in another way.
When I see a paper making an extraordinary
claim I think it is appropriate to judge it not just on the statistical
evidence presented within it, which refer only to the parameters of that
particular experiment, but on the more general relative likelihood set up here:
Which is more likely? That the researchers
have happened on some truly novel and extraordinary biology or that something
funny happened somewhere?
A few years ago, when an international team
of physicists published evidence that neutrinos can travel 0.002% faster than
light, even the authors of the paper didn’t believe it, though all their
measurements and calculations had been exhaustively checked and rechecked and
the evidence from the experiment itself seemed pretty conclusive. Turns out a
fiber-optic cable in some important machinery was loose, leading to a very
small error in timing.
Their skepticism was thus entirely justified,
based on their wider knowledge, not just the isolated evidence from the
particular experiment. This gets back to Sagan’s point about balancing wonder
with skepticism – scientific training and knowledge should inform our judgment
of novel findings, not just the p-values within a given paper.
Now, I’m not saying that models of TGEI are
as heretical as faster-than-light travel, but they’re certainly unexpected
enough to make us calibrate our prior expectations downwards. Indeed, the one
described above on olfactory experiences would require not just one but a
multiplicity of novel mechanisms to achieve.
But, hey, weird shit happens – who am I to
rule it out? It may seem the height of arrogance or even dogmatic intransigence
to do so. But there is one other general reason to be skeptical of the
described phenomena and the supposed mechanisms that would have to exist to
mediate them: there’s no reason for them. They don’t solve any problem – neither
one facing scientists nor one facing organisms.
4. It
doesn’t solve any problem
One of the problems facing scientists, to
which researchers often suggest TGEI might be part of the answer, is known as
the case of the “missing heritability”. I know, it sounds like Scooby Doo… “What do you
know, it was Old Man Epigenetics all the time!” Well, it wasn’t.
The missing heritability problem refers to
the fact that many human traits and disorders have been shown to highly
heritable, but it has proven difficult to find the genetic variants that
account for all of this heritability. Heritability is a highly technical term
and should not be confused with the more colloquial term heredity.
When we say a trait is, say, 60% heritable,
what that means is that 60% of the variance in that trait across the population
(the degree of spread of the typically bell-shaped curve of individual values)
is attributable to genetic variation. This can be estimated in various ways, including
twin or family studies, or, more recently, by studies across thousands of only
very, very distantly related people.
The important thing about these designs is
that they explicitly dissociate effects due to genetics from effects due to
environmental factors. In twin studies, for example, it is the excess
similarity of monozygotic twins over that of dizygotic twins that allows us to
attribute some proportion of the MZ twins’ similarity to shared genetics (and,
inversely, some proportion of the phenotypic variance across the population to
genetic differences).
Since any TGEI effects due to environmental
factors should affect all offspring equally, they would, based on these
experimental designs, be explicitly excluded from the heritability term. They therefore
cannot, by definition, help explain the missing heritability.
Another idea is that TGEI provides a
mechanism of passing on knowledge gained through behavioural experiences – such
as an individual learning that an odor is associated with a shock. You might
imagine a more natural scenario where an animal learns that a certain foodstuff
make it sick. Maybe it would be good to have some kind of mechanism to pass
that knowledge on in animals that don’t have language and culture. But there
just isn’t any evidence for the existence of such a phenomenon from the
ecological and behavioural literature. There is no mystery in the field that
TGEI is solving.
The same can be said for the idea that TGEI
is a mechanism of extending developmental plasticity across generations until
evolution through genetic changes can catch up with the need for adaptation to
a new environment. Developmental plasticity is itself a genetically encoded
phenotype. It doesn’t need any help from epigenetics.
Finally, the notion that epigenetics
(transgenerational or not) may be a mediator of environmental causes of
conditions like autism also has no real support. In fact, the notion that there
are important environmental causes of
autism has no real support. The heritability of autism is over 80%. It is an
overwhelmingly genetic condition. People with autism were, on average,
genetically at enormously high risk of developing autism (based on the average
concordance between MZ twins). Twin and family studies strongly suggest that
the shared family environment makes zero contribution to risk. So, even the
fact that 20% of variance in risk that is not attributable to genetics does not
necessarily indicate the existence of environmental risk factors. (Indeed, much
of the remaining variance may be due to intrinsic noise in the processes of
brain development).
Ultimately, there is nothing where we can
say: “We know that X happens, but we don’t know how. Maybe TGEI is a mechanism
that can mediate X.” Instead, the introduction to these papers usually reads
like this: “We know that TGEI can happen in X. [Narrator: we don’t know that].
Maybe it also happens in Y”.
So, until someone can show me a scenario
where TGEI solves a known problem, has at least a conceivable, biologically
plausible mechanism, is robust enough to provide an experimental system to work
out the actual mechanism, and has convincing enough evidence of existing as a
phenomenon in the first place, I will keep my skepticometer dialled to 11.
Comments
Post a Comment