Tuesday, April 12, 2016

Is a polygenic model of schizophrenia genetics really proven?

Kenneth Kendler’s article on the nature of genetic variation and the nature of schizophrenia claims that theory and empirical evidence have proven the polygenic architecture of this disorder. In fact, both theory and data are entirely consistent with a very different model of high genetic heterogeneity, where the disorder is largely caused in individuals by one or a few mutations in any of a large number of genes, incorporating important and complex effects of genetic background. 

KK provides a scholarly overview of the history of ideas in these intertwined fields1. While historically interesting, the early arguments between biometricians and Mendelians about continuous versus dichotomous traits conflate two distinct questions: (i) what type of genetic variation contributes to the gradual evolution of new species?, and (ii) what type of genetic variation causes disease? There is no reason to expect these to have the same answer and many reasons not to.

With regard to the genetic architecture of SZ, KK presents the history of various models, from those positing a single major locus to those invoking polygenic mechanisms based on the work of Fisher, Falconer and others. Of course, the single major locus model has long since been rejected and the current debate is really between (i) models of extreme genetic heterogeneity, where the disease is largely caused by one or a small number of rare mutations in each affected individual (in any of a large number of different genes), and (ii) polygenic models involving the combined effects of thousands of common variants that “constitute the gene pool of our species”.

The only reference KK makes to models of genetic heterogeneity regrettably repeats a commonly held but mistaken notion, i.e., that the (negative) results of linkage analyses for SZ refute the theory that the disorder is a “common pathway for a large number of rare quasi-Mendelian disorders”, based on the idea that multiple linkage peaks would have been found if that were the case. This is demonstrably false. Most SZ linkage studies bundled together many small families, as large multiplex SZ pedigrees are rare. If the disorder shows a high level of genetic heterogeneity, combining families will necessarily obscure real linkage signals2. Recent simulations bear this out: in cases where a disorder is associated with decreased fitness and high genetic heterogeneity, linkage studies are predicted to fail3.

KK also presents several lines of positive evidence as supporting – indeed proving – that the polygenic model of SZ is correct. First, he argues that the existence of a phenotypic continuum between clinically diagnosable SZ and SZ-like personality disorders in first-degree relatives proves a polygenic model. It does not. Many classical single-gene disorders show incomplete penetrance and variable expressivity. In some cases, these are due to modifier genes in the background, but – as for SZ itself – phenotypes often vary substantially even between monozygotic twins. What these observations really highlight is that psychiatric diagnostic categories do not represent distinct biological phenotypes, but only one of a range of possible outcomes. The clinical and etiological overlap between SZ and other neurodevelopmental disorders, including autism, epilepsy and intellectual disability reinforces this point4.

Second, KK claims that recent genome-wide association studies and related analyses “have shown that for schizophrenia, Fisher’s model is largely correct”. This interpretation is not warranted by the data. A recent, very large-scale GWAS identified 108 loci with common single-nucleotide polymorphisms (SNPs) showing positive association signals with disease (higher frequency in cases than controls)5. However, GWAS signals do not identify causal variants or inform as to their allelic frequency. Numerous examples of synthetic associations caused by rare mutations have been demonstrated and the fact that rare mutations in many of the loci implicated are known to confer high risk for neuropsychiatric diseases supports this possibility5.

But even if the causal variants are common, this does not imply that the polygenic model is correct. The GWAS signal is a population-level average statistic and does not speak to how these variants act in individuals. Rather than acting in purely polygenic fashion – a hypothetical mechanism never actually demonstrated to cause disease – common variants may instead act as important modifiers of risk due to rare variants or environmental perturbations – a perfectly well-established mechanism (e.g., ref. 6).

Genome-Wide Complex Trait Analyses also cannot determine the number of contributing loci per individual, the number of causal variants across the population or the frequency of causal variants. This is stated clearly by Lee et al: From the analyses we have performed, we cannot estimate a distribution of the allele frequency of causal variants”7. These analyses merely show (or claim) that extremely small statistical increases in risk can be detected across distant relatedness, presuming the technical assumptions and methods are valid7,8. In any case, GCTA analyses for SZ show that most genetic risk is NOT associated with common variants.

Genetic epidemiology at the population level can point to loci of interest but the findings do not restrict or even really inform on the genetic architecture of the disorder in individuals. The empirical data are perfectly consistent with a model of high genetic heterogeneity, where most cases are associated with one or a small number of high-risk mutations9, and where the phenotypic expression of these mutations is affected by genetic background10.

Finally, it seems strange to draw moral conclusions about how we should think of or treat people with SZ based on the genetic architecture of the disease. There does not need to be a continuum of risk across the population for healthy people to feel sympathy for those affected. It is very clear, from monozygotic twin concordance rates of ~50%, that those who have SZ were at very high risk of developing it, on average, with the corollary that the majority of the population had effectively zero risk. “Liability” may be normally distributed but that is an imaginary statistical construct – actual risk is clearly not continuous, under any model of genetic architecture. No moral conclusions derive from that fact.

[Postscript: I haven't gone in to all the positive evidence for an important role for rare mutations of large effect in the etiology of SZ, but see here for many examples and more details: The Genetic Architecture of Neurodevelopmental Disorders.]

10.1017/S003329171000070X (2011).
10.1186/gb-2012-13-1-237 (2012).

* I originally wrote this as a letter to the editor at Molecular Psychiatry, in response to the article referenced by Kenneth Kendler, but they didn't like it, so I just decided to post it here instead.

Saturday, March 19, 2016

The surprising real genetics behind the X-Men

The X-Men – everyone’s favourite mutants – are hugely popular, thanks to seven feature films since 2000, with more on the way, and to the enormously successful comics that have been running for over 50 years. With that kind of exposure, they can have a real influence on the public perception of genetics, grounded as they are in ideas of mutation and evolution of the “next stage of humanity”. So, is there any real scientific basis underlying these stories? Well, to a geneticist, most of the supposed mutant abilities of the X-Men and their mutant brethren are frankly ludicrous. No matter how mutant you are, the laws of physics will still apply! (Except for Entropy Man of course). But some of them are less far-fetched, reflecting the strange and wonderful world of real-life biology. And underlying them all is a much deeper mechanism that has, in the real world, a profound effect on how mutations contribute to evolution.

The X-Men were created in 1963 by Stan Lee at Marvel comics. He had already done the Fantastic Four and the Hulk and Spider-Man but wanted to create a new team of superheroes, with a bunch of different powers. And when it came to their origin story, he was, by his own admission, just being lazy. He realised he couldn’t have everyone exposed to cosmic rays or gamma rays or bitten by a radioactive spider – you’d quickly run out of radioactive animals to bite people that would be in any way cool – Mosquito-Man or Bed Bug-Man just don’t sound that awesome (though The Tick has always been a personal favourite).

So, he decided he would simply make them all “mutants” – they were all just born that way. That way he could have one have lasers coming out of his eyeballs and another able to control the weather and another able to turn into ice and just say it was all down to mutation.

He actually wanted to call the comic The Mutants, but he was over-ruled on the basis that no one at the time really knew what mutants were – the word was in the air, perhaps, linked to the ever-present threat of radiation that loomed large in the public consciousness during the Cold War, but it wouldn’t have been widely understood. And it seems from looking at some of their abilities that Stan Lee didn’t really know what mutants are either, because a lot of those powers are just absurd.

I mean, some animals can manage a dim glow that you can see in the darkest depths of the ocean but lasers out of your eyeballs like Cyclops is a bit of a stretch. Where would the power come from? And while we’re at it, how is he supposed to see anything?

Or controlling magnetism like Magneto – again, some animals can emit very weak electromagnetic fields – duck-billed platypuses do that to detect their prey, for example – but the idea that a mutation could let you lift an aircraft carrier is just silly. (It’s inspired silliness, but still). On the other hand, if you want an origin story for a character with the ability to manipulate magnetism, I suppose it beats being bitten by a radioactive platypus.

Many of the other abilities also require suspension of the laws of physics (a big ask for a little change in your DNA), but a few of them actually have some grounding in real biology.’s take everyone’s favourite mutant, Wolverine. His main power is his super-healing ability, which actually is quite plausible (in kind anyway, if not in degree). There are, in fact, strains of mice called “super-healing mice” (AKA Murphy Roths Large / lymphoproliferative mouse strain!) where something very similar has been found. These strains were initially of interest because they are prone to autoimmune disorders. Their super-healing ability was discovered quite by accident when researchers noticed that the ear punches they used to keep track of individual mice were healing over completely! These mice show increased wound healing generally, with much reduced scarring and even more rapid healing of broken bones. It’s still not really known why this is or which genes are responsible, however.

Wolverine is also ferocious (a berserker, in fact), and something like that can also be caused by mutations. In fact, there are many mutations that affect aggressiveness, (having a Y chromosome certainly does), but the ones with the biggest effect in mice are in a gene called NR2E1, which is involved in brain development. A line of mice with mutations in this gene are called “fierce” because they’re so aggressive, with both males and females viciously attacking other mice or even anyone foolish enough to put their hand in the cage.

Beast is another mutant whose abilities are not completely ludicrous. There are mutations that
can make you super-strong, in genes called myostatin or activin, which normally act to restrict muscle growth. When these genes are mutated, in cattle, mice or humans, muscle growth can increase by two-fold or more, with concomitant increases in strength. And there are also mutations that can make you grow hair all over your body (a condition called hypertrichosis, or, less sympathetically, werewolf syndrome).

Now it’s not usually blue hair, like Beast’s, but there is another mutation that does cause shockingly blue skin coloration, in a condition called methemoglobinaemia. It is famous from a particular kindred from the wonderfully named town of Troublesome Creek in the Appalachian mountains of Kentucky. They are known as the “blue Fugates”, that being the most common last name in the clan, and they really do have skin close to the colour of Mystique or Nightcrawler. No signs of shape-shifting or teleportation, though (although you never know with people from Kentucky – they’re tricksy…)

Let’s see, how about Professor X? He’s a telepath, of course, with an ability to read minds and manipulate people. As crazy as it sounds, there is a genetically distinct group of people who are much better than the rest of the population at reading minds – they’re called women. On average, women score higher than men on measures of empathy and performance on tasks like “reading the mind in the eyes test”. People with autism tend to do very poorly on such tests, but so does a sizeable proportion of the general male population. Whether there are people at the other end of the spectrum, with really heightened abilities – super-empathisers – remains unknown, though it seems plausible enough.

So, overall, most of the X-Men abilities are completely nuts but a few are only wildly exaggerated, like super-strength or super-healing or being blue or hairy. But here’s the thing – all those things arise from mutations in different genes while the X-Men are all supposed to have inherited a mutation in the same gene – the “X gene”, yet they have very different abilities. So how could that be?

Despite not actually knowing anything about genetics, Lee stumbled onto an idea that actually exists and that, in fact, plays an important role in evolution. There really is a gene that, when mutated, causes all kinds of different effects in different individuals.

This gene is called Hsp90 and it encodes what’s known as a “heat shock protein”. Heat shock proteins are turned on in cells when they are under stress – like when you suddenly raise the temperature. Their job is to help the cell deal with that stress and in particular to help other proteins in the cell to fold into the right shapes.

We have about 20,000 different proteins in our cells, each one encoded by a different gene. Each protein is made from a string of subunits called amino acids – there are twenty different kinds that are strung along in a specific sequence encoded by the DNA sequence of that gene. As each protein is being made, that string of amino acids folds back on itself in a kind of molecular origami, making a complex three-dimensional structure, the shape of which depends on all the forces between all the atoms in those amino acids. The particular 3D shape of each protein is crucial for it to do its job.

Now, when the temperature goes up, this distorts those forces and it disrupts the folding, so that many proteins become non-functional (which is very bad for the cell or organism). The job of Hsp90 and other heat shock proteins is to help them to fold into the right shape – it (almost literally) grabs hold of them and shakes them up and gives them a chance to make the right structure.

So, Hsp90 can help a cell deal with sudden stress by detecting and correcting wrongly folded proteins. But the other thing that can make a protein fold wrong is if it has a mutation in it. If you mutate the DNA sequence of a gene you can change the instructions so the wrong amino acid is inserted into the protein at a particular position and that can stop it from folding properly. But if it’s given a good shake by Hsp90, then it can snap out of it and pull itself together.

That sounds great – Hsp90 can protect the cell from the effects of mutations that alter protein folding. But there’s a dark side – the result is that those kinds of mutations then start to accumulate in a species, because Hsp90 is there to make sure they don’t have any effect. Indeed, all of us have mutations in all kinds of genes that aren’t having any effect because of Hsp90 and genes like it.

Now, do you see where I’m going? What happens when Hsp90 gets mutated? Suddenly all those other mutations – whichever ones were in the background in any particular individual – can have an effect. And that’s exactly what people saw when they mutated the Hsp90 in fruitflies – they started seeing flies with all kinds of different phenotypes: deformed or missing eyes, misshapen wings, altered pigmentation, extra bristles, duplicated body parts.

All kinds of freaky stuff, just from mutating that one gene – all the genetic variation that was being buffered and not having any effect was suddenly released. And not only that, when they put the animals under stressful conditions (high temperature) it got even freakier. Which is another central part of the X-Men mythology – the idea that, while they are born mutants, their abilities often lie latent for years and only come out at times of high stress. Often this is when they’re teenagers, because, as well know, being a teenager is, like, OMG, sooooo stressful!

In evolution, this kind of mechanism is hugely important – it allows so-called “cryptic genetic variation” to accumulate in a population without affecting the phenotypes of the individuals. But if the environment changes (or the organisms move to a new environment), the stresses associated with that may “release” some of that genetic variation, so that it starts to affect the traits of individuals. And somewhere among that pool there may be some changes that are adaptive to the new environment. Those differences may be selected for in the new environment so that the species (though not all the individuals in it) can adapt more rapidly than they would have if they came in with a clean slate, as it were, and had to wait for new mutations to arise. The cryptic genetic variation is thus a source of evolutionary potential.

So, despite the fact that Stan Lee seems to have known very little about genetics, the central premise of the X-Men isn’t that far-fetched after all and actually reflects a mechanism that is central to how species evolve and adapt to new conditions.

Still doesn’t mean we’ll have laser-beams shooting out of our eyeballs any time soon, though…

Monday, January 4, 2016

Sex on the brain – a tale of two studies

The issue of whether there are biological differences between male and female brains is a fraught one and an area where political positions or prior expectations seem to have a strong influence on the interpretation of scientific data. These trends are illustrated by two papers published in the last couple years, which, despite fairly comparable findings, were interpreted in almost polar opposite fashions.

Both studies found strong group differences between male and female brains, one in volume of brain areas, the other in structural connectivity. But the authors of one study went on to (over)interpret these group differences as the basis for sex differences in cognition, while the other downplayed them entirely and instead emphasised the inherent variability within genders to conclude that there was no such thing as a “male brain” or a “female brain”.  Both received extensive coverage in the media, fuelled by the associated press releases, resulting in headlines making hilariously contradictory claims, even in the same newspaper! 

The 2013 study was described with these headlines:

Brain Connectivity Study Reveals Striking Differences Between Men and Women

The 2015 study with these:

The brains of men and women aren’t really that different, study finds

Men are from Mars, women are from Venus? New brain study says not (The Guardian again!)

Let’s look at the more recent one first, to see what the data actually show and how they were analysed and interpreted. Daphna Joel and colleagues analysed MRI scans of 169 females and 112 males, and segmented them into 116 regions using a standard brain atlas. By analysing how much warping was required to map each brain onto a reference template, it was possible to compare the relative grey matter volume of all these regions across the two sexes. From this group comparison the 10 regions showing the largest sex differences were chosen for subsequent analyses.

So far, so good: the primary finding is that there are statistically significant group differences between males and females in grey matter volume across many brain regions. That’s nothing new – a recent meta-analysis of 167 studies confirms consistent group sex differences in many brain areas between men and women

The authors went on, however, to ask what could have been a more interesting question: across those 10 regions, how “male” or “female” were the structures of individual brains? This is where the subjectivity comes in – there are many ways to analyse these data and the authors chose arguably the most simplistic and extreme one, which enabled them to draw the conclusion that male and female brains are not categorically different. 

They report that: “35% percent of brains showed substantial variability, and only 6% of brains were internally consistent”. Importantly they chose to classify only those subjects showing extreme male or female values for all 10 regions as “internally consistent”. A quick look at panel E of the figure below shows that while such brains may indeed be rare, most of the female brains showed a mostly female pattern (lots of pink) while most of the male brains showed a mostly male pattern (lots of blue – don’t blame me, I didn’t pick the colours!). 

There is, in fact, nothing at all surprising in their finding of substantial variability within individuals. To explain why, consider the distributions for height for males and females.
These distributions are very wide and mostly overlapping but there is a strong and consistent group difference in the mean – the distribution for males is shifted to the right. For any individual, however, knowing their sex gives almost no predictive power for how tall they are. What the group difference does suggest is the following: if I know how tall a particular woman is, I can say that if she had been a man (but was genetically otherwise identical) she would probably have been a little taller than that. She may happen to fall at the low or high end of the overall spectrum for other reasons, but that prediction remains the same. The existence of the group difference does not suggest that all males should be at the extreme “male” end of the height distribution or they’re not really very manly at all. That would be true if all other things were equal, but they’re not equal, and those other variations, which have nothing to do with sex, have a much bigger effect on final height than the sex effect does.

Now, consider what will happen if we have ten different variables, each showing that same sort of wide distribution with an even smaller group sex effect. If the volumes of different brain regions vary independently within individuals (taking overall brain volume out of the equation - as shown here, for example), then we should expect some of these values to fall more towards the male end and others more towards the female end in any individual simply due to that underlying variation, which has nothing to do with sex. It would be extremely unlikely to end up at the extreme end for all ten regions, by chance, and such individuals should thus be extremely rare, as observed.

So, the fact that each individual shows this kind of pattern does not mean that each of us has a “mosaic brain” that is partly male and partly female, as claimed by the authors. It is simply exactly what is expected given that sex is only one of the factors affecting the size of each of these regions. We can’t know for each individual what the size of each region would have been if their sex were different (which is really what we’d like to know) – we can only deduce from the group average effects that there would likely have been some effect.

The headlines suggesting that male and female brains are not that different are thus not well supported by these findings at all. The group differences are clear and highly significant. And even if very few of the males or females are at the extreme end of the distribution for all ten of these regions, the overall pattern suggests that you could build a very good classifier from the volumes of these ten regions taken together, which would be quite successful at predicting whether a given brain scan came from a male or a female. Indeed, this would have been a far more objective test of whether MRI volumetric differences between male and female brains are categorical or dimensional.

Given this, it is interesting to ask why the authors chose to analyse and present their data in the way they did. This is what they say in the introduction to the paper:

"Documented sex/gender* differences in the brain are often taken as support of a sexually dimorphic view of human brains (female brainvs. male brain), and consequently, of a sexually dimorphic view of human behavior, cognition, personality, attitudes, and other gender characteristics (3). Joel (4, 5) has recently argued that the existence of sex/gender differences in the brain is not sufficient to conclude that human brains belong to two distinct categories. Rather, such a distinction requires the fulfillment of two conditions: one, the form of the elements that show sex/gender differences should be dimorphic, that is, with little overlap between the forms of the elements in males and females. Two, there should be a high degree of internal consistency in the form of the different elements of a single brain (e.g., all elements have the maleform)."

It seems pretty clear from that that the authors set out to show that male and female brains are not that different, or at least not dimorphic. In particular, they take aim at a paper by Madura Ingalhalikar and colleagues (their reference 3, above), which is the second paper I wish to discuss. These authors found comparable group difference results as Joel et al (using a different measure of brain structure), yet reached almost opposite conclusions.

They used diffusion tensor imaging to define the structural connectivity networks across the brains of 949 youths (428 males and 521 females). They then analysed these networks using a variety of statistical measures of regional and global connectivity and compared these between males and females. They found that females had greater connectivity between hemispheres than males, on average, while males had greater connectivity within each hemisphere. Males also showed greater local connectivity and concomitantly increased modularity in the network (again, on average).

(In this figure from the paper, the top panel shows connections that are stronger in males, the bottom those that are stronger in females; blue are intrahemispheric, orange are interhemispheric).

Once again, so far, so good – the results look significant and interesting. (It would have been nice to see the analyses done with a discovery and replication sample, instead of one big group but at least it is a large sample). Where these authors got onto shakier ground was in extrapolating their findings as explanations for a variety of group differences in cognition between men and women. The participants in the structural connectivity analysis were part of a larger sample for which cognitive data had already been obtained, showing sex differences in a variety of domains. Such differences have been widely documented and range from quite small to fairly large (see here for a meta-analysis). 

However, the idea that the structural connectivity network differences observed are the cause of such cognitive differences is entirely speculative. I have nothing against speculation, per se, and the discussion section of a paper is a perfect place to explore the possible implications of one’s results. Where this got a bit out of hand was in the associated press release and the consequent media coverage. This is from the press release itself: 

"“These maps show us a stark difference--and complementarity--in the architecture of the human brain that helps provide a potential neural basis as to why men excel at certain tasks, and women at others,” said Verma. [Regini Verma, senior author]

For instance, on average, men are more likely better at learning and performing a single task at hand, like cycling or navigating directions, whereas women have superior memory and social cognition skills, making them more equipped for multitasking and creating solutions that work for a group. They have a mentalistic approach, so to speak. "

Those kinds of assertive generalisations, and especially the idea that the connectivity findings provide a neural basis for them, are not at all supported by the data and rightly provoked howls of protest from the scientific community. This included commentary by Joel and colleagues , to which Ingalhalikar and colleagues responded.  The unfortunate outcome was that the authors’ over-extrapolation ended up undermining trust in their primary findings, which actually look quite solid in themselves.

To my mind, both these studies over-reached in the interpretation of their results, ironically drawing opposite conclusions from what are broadly comparable primary findings. More generally, it also seems that a little more humility is in order in drawing sweeping conclusions from these kinds of studies, given the crudeness of group-wise volumetric and tractography analyses and the very low resolution of MRI scans. Even if such scans showed no consistent group differences between male and female brains, this would not imply that male and female brains are not different. It would only imply such differences could not be detected by MRI. We know there are many differences in the numbers of neurons in small brain regions or numbers of connections between regions in male and female brains that are invisible to MRI, not to mention sex differences in densities of synaptic spines or other subcellular parameters that have also been demonstrated (as in this recent example). 

A final note: why should we care? Why should we investigate sex differences in the brain? And if we find them, what are their implications for public policies? Many people are rightly concerned that demonstrations of biological differences in brain structure between males and females will be used to reinforce the idea of systematic differences in cognitive abilities and justify sexism. Of course, even if such differences were large and consistent across individuals, it would not imply one version is better than the other. But more importantly, the distributions for cognitive domains are so overlapping and the sex effects typically so small that inferring anything about the cognitive profiles of individuals on the basis of these group differences is, simply put, a very bad bet. Sex differences for interests are a little bit bigger, but still by no means categorical and there is likely a strong cultural reinforcement of gender norms in this area.

There are, however, other areas where there are more robust sex differences. The most obvious but also the most commonly over-looked of these is sexual preference – something in the brains of males makes the vast majority of them sexually attracted to females, and vice versa. This is by far the strongest genetic effect on behaviour that we know of in humans (mediated by the SRY gene on the Y chromosome). It would therefore be interesting to find out how that preference is wired into the brain, as an exemplar for how genes can influence innate behaviour. Sex differences in physical aggression are also large and another important topic to understand (as are differences in idiotic behaviour as measured by the Darwin awards!).

Finally, though, a main reason we should care is due to the large sex differences in prevalence of psychiatric conditions, which range from autism, ADHD and Tourette syndrome (much more common in males), to schizophrenia and dyslexia (more common in males), to depression (more common in females) and eating disorders (much more common in females). There is strong and consistent evidence, for example, that females are somewhat protected against the effects of mutations that typically cause autism in males. Females may carry such mutations with relatively little clinical effect; conversely, females who do have autistic symptoms tend to have larger or more severe mutations than affected males (suggesting that it takes a more drastic insult at the genetic level to push a female brain into a clinically autistic state). Understanding how sex influences vulnerability to these conditions is thus a hugely important question.

Too important to let politics, bias or spin affect our interpretation of scientific findings. 

Thursday, December 3, 2015

On literature pollution and cottage-industry science

A few days ago there was a minor Twitterstorm over a particular paper that claimed to have found an imaging biomarker that was predictive of some aspect of outcome in adults with autism. The details actually don’t matter that much and I don’t intend to pick on that study in particular, or even link to it, as it’s no worse than many that get published. What it prompted, though, was more interesting – a debate on research practices in the field of cognitive neuroscience and neuroimaging, particularly relating to the size of studies required to address some research questions and the scale of research operation they might entail.

What kicked off the debate was a question of how likely the result they found was to be “real”; i.e., to represent a robust finding that would replicate across future studies and generalise to other samples of autistic patients. I made a fairly uncompromising prediction that it would not replicate, which was based on the fact that the finding derived from: a small sample (n=31, in this case, but split into two), an exploratory study (i.e., not aimed at or constrained by any specific hypothesis, so that group differences in pretty much any imaging parameter would do) and lack of a replication sample (to test directly, with exactly the same methodology, whether the findings from the study were robust, prior to bothering anyone else with them).

The reason for my cynicism is twofold: first, the study was statistically under-powered, and such studies are theoretically more likely to generate false positives. Second, and more damningly, there have been literally hundreds of similar studies published using neuroimaging measures to try and identify signatures that would distinguish between groups of people or predict the outcome of illness. For psychiatric conditions like autism or schizophrenia I don’t know of any such “findings” that have held up. We still have no diagnostic or prognostic imaging markers, or any other biomarkers for that matter, that have either yielded robust insights into underlying pathogenic mechanisms or been applicable in the clinic.

There is thus strong empirical evidence that the small sample, exploratory, no replication design is a sure-fire way of generating findings that are, essentially, noise.

This is by no means a problem only for neuroimaging studies; the field of psychology is grappling with similar problems and many key findings in cell biology have similarly failed to replicate. We have seen it before in genetics, too, during the “candidate gene era”, when individual research groups could carry out a small-scale study testing single-nucleotide polymorphisms in a particular gene for association with a particular trait or disorder. The problem was the samples were typically small and under-powered, the researchers often tested multiple SNPs, haplotypes or genotypes but rarely corrected for such multiple tests, and they usually did not include a replication sample. What resulted was an entire body of literature hopelessly polluted by false positives.

This problem was likely heavily compounded by publication bias, with negative findings far less likely to be published. There is evidence that that problem exists in the neuroimaging literature too, especially for exploratory studies. If you are simply looking for some group difference in any of hundreds or thousands of possible imaging parameters, then finding one may be a (misplaced) cause for celebration, but not finding one is hardly worthy of writing up.

In genetics, the problems with the candidate gene approach were finally realised and fully grappled with. The solution was to perform unbiased tests for SNP associations across the whole genome (GWAS), to correct rigorously for the multiple tests involved, and to always include a separate replication sample prior to publication. Of course, to enable all that required something else: the formation of enormous consortia to generate the sample sizes required to achieve the necessary statistical power (given how many tests were being performed and the small effect sizes expected).

This brings me back to the reaction on Twitter to the criticism of this particular paper. A number of people suggested that if neuroimaging studies were expected to have larger samples and to also include replication samples, then only very large labs would be able to afford to carry them out. What would the small labs do? How would they keep their graduate students busy and train them?

I have to say I have absolutely no sympathy for that argument at all, especially when it comes to allocating funding. We don’t have a right to be funded just so we can be busy. If a particular experiment requires a certain sample size to detect an effect size in the expected and reasonable range, then it should not be carried out without such a sample. And if it is an exploratory study, then it should have a replication sample built in from the start – it should not be left to the field to determine whether the finding is real or not.

You might say, and indeed some people did say, that even if you can’t achieve those goals, because the lab is too small or does not have enough funding, at least doing it on a small scale is better than nothing.

Well, it’s not. It’s worse than nothing.

Such studies just pollute the literature with false positives – obscuring any real signal amongst a mass of surrounding flotsam that future researchers will have to wade through. Sure, they keep people busy, they allow graduate students to be trained (badly), and they generate papers, which often get cited (compounding the pollution). But they are not part of “normal science” – they do not contribute incrementally and cumulatively to a body of knowledge.

We are no further in understanding the neural basis of a condition like autism than we were before the hundreds of small-sample/exploratory-design studies published on the topic. They have not combined to give us any new insights, they don’t build on each other, they don’t constrain each other or allow subsequent research to ask deeper questions. They just sit there as “findings”, but not as facts.

Lest I be accused of being too preachy, I should confess to some of these practices myself. Several years ago, while candidate gene studies were still the norm, we published a paper that included a positive association of semaphorin genes with schizophrenia (prompted by relevant phenotypes in mutant mice). It seems quite likely now that that association was a false positive, as a signal from the gene in question has not emerged in larger genome-wide association studies.

And the one neuroimaging study I have done so far, on synaesthesia, certainly suffered from a small sample size (at the time it was considered decent), and no replication sample. In our defense, our study was itself designed as a replication of previous findings, combining functional and structural neuroimaging. While our structural findings did mirror those previously reported (in general direction and spatial distribution of effects, though not precise regions), our functional results were quite incongruent with previous findings. As we did not have a replication sample built into our own design, I can’t be particularly confident that our findings will generalise – perhaps they were a chance finding in a fairly small sample. (Indeed, the imaging findings in synaesthesia have been generally quite inconsistent and it is difficult to know which findings constitute real results that future research studies could be built on).

If I were designing these kinds of studies now I would use a very different design, with much larger samples and in-built replication (and pre-registration). If that means they are more expensive, so be it. If it means my group can’t do them alone, well that’s just going to be the way it is. No one should fund me, or any lab, to do under-powered studies.

For the neuroimaging field generally that may well mean embracing the idea of larger consortia and adopting common scanning formats that enable combining subjects across centres, or at least subsequent meta-analyses. And it will mean that smaller labs may have to give up on the idea of making a living from studies attempting to find differences between groups of people without enough subjects. You’ll find things – they just won’t be real.