The murderous brain - can neuroimaging really distinguish murderers?
A new study claims that neuroimaging can be used to distinguish the brains of murderers from non-murderers. It follows in a long tradition of attempts to find biological indicators of violent criminality, from faces to skull bumps to genes to brains. But are the data convincing? Does this study really accomplish what it claims? Is it even based on a well-founded question? And what are the ethical implications?
Here is the abstract of the paper, by Ashly Sajous-Turner and colleagues:
Homicide is a significant societal problem with economic costs in the billions of dollars annually and incalculable emotionaimpact on victims and society. Despite this high burden, we know very little about the neuroscience of individuals who commit homicide. Here we examine brain gray matter differences in incarcerated adult males who have committed homicide (n = 203) compared to other non-homicide offenders (n = 605; total n = 808). Homicide offenders’ show reduced gray matter in brain areas critical for behavioral control and social cognition compared with subsets of other violent and non-violent offenders. This demonstrates, for the first time, that unique brain abnormalities may distinguish offenders who kill from other serious violent offenders and non-violent antisocial individuals.
Let’s first think about the reasoning behind the study, which is encapsulated in the clause: “we know very little about the neuroscience of individuals who commit homicide”. There is not much more detail given in the paper itself as to the motivation for the study, but this clause is rather revealing of a number of underlying assumptions:
1. Murderousness is something that there is “a neuroscience of”.
2. There is, moreover, something in common in the brains of murderers – some shared difference that can distinguish them as a group from non-murderers.
3. Structural neuroimaging is a good way to detect such a difference – that is, it should manifest in the volume of some brain regions.
4. Identifying such brain regions will tell us something about the biology of violence.
5. Identifying the pattern of brain differences will allow us to distinguish murderers from non-murderers.
So, the unstated hypothesis is that murderers differ biologically from non-murderers, and that we can discover the murderous essence by looking at the size of bits of the brain. In addition, it is implied that it would be a good thing if we could do that.
Would it be a good thing? Good for whom? As mentioned above, many people have tried to find some kind of biological marker to distinguish those inclined to violence and especially to murder. In the 1800’s, phrenology was looked to as a way of telling who was bad and who was mad. As described in this excellent piece by James Bradley: “Two ideas lay at the heart of phrenology’s seductive power. First, different areas of the brain were associated with different mental capacities or faculties. And, as the brain developed, it shaped the skull.” That meant that the detailed landscape of bumps and depressions on a person’s skull gave a window to their personality, including possibly murderous instincts.
Francis Galton, the father of eugenics, looked to physiognomy for markers of criminality, painstakingly creating composite photographs of the faces of criminals and comparing them to upstanding members of Victorian society. This tradition has been recently revived with the brute force of machine learning, in a study in China claiming to be able to discriminate criminals from law-abiding citizens based on pictures of their faces. (Additional analyses suggest a bias in the collection of photos that may be driving the effect).
Genetic variants have also been invoked as underpinning a murderous instinct in some people. The most famous of these is the gene encoding monoamine oxidase-A, or MAOA, an enzyme involved in serotonin metabolism. Serious mutations in this gene are indeed associated with a drastic increase in violent criminality. Fortunately, such mutations are extremely rare. Unfortunately, the idea was extended to a very common variation in the same gene, which was associated with all kinds of behavioral outcomes in candidate gene analyses, which have subsequently been shown to be spurious. This hasn’t stopped the idea of the “warrior gene” from taking hold in the public consciousness, however, nor has it prevented information on a defendant’s MAOA common genotype being used in court.
All of these approaches assume that murderers differ biologically from non-murderers (in a way that at least partly explains their murderousness). Is there any reason to think this is the case? Well, yes, there is.
First of all, the vast majority of murderers are men. Worldwide, 96% of homicides are committed by men (and 78% of the victims are male). There is lots of evidence from other species that suggests this reflects an innate difference in physical aggressiveness between the sexes, with an underlying neural basis. So, yes, it seems some brains may be more murdery than others. This at least establishes the principle that there could be something biologically different between individuals (in addition to sex) that influences likelihood to commit homicide.
That idea is further supported by twin and adoption studies showing that antisocial behaviour, physical aggression, arrest, and incarceration for violent crimes are all partly heritable. That is, being genetically related to someone with high levels of such behaviors makes it more likely that a person will also show such behaviors. This effect is additional to the significant effects of growing up in the same household with people showing such behaviors. (For those keeping score, genetic effects (i.e., the heritability) tend to explain ~30-40% of the variance and the effect of the shared environment tends to be about the same).
So, okay, it’s not crazy to think that some biological factors affecting an individual’s psychological make-up make a contribution to likelihood to commit homicide.
But is there any reason to think that murderers would differ in one particular way? The group comparison design of the study implies this hypothesis. But if the Anna Karenina model applies – “happy families are all alike; every unhappy family is unhappy in its own way” – then there may be little in common between the biological factors making one person murderous versus those at play in another. If each murderer were unhappy in his own way (biologically speaking), you would never detect that in a group average comparison.
What might such biological factors be? What psychological traits might increase one’s likelihood to commit homicide? (Whether one actually does or not would be hugely context-dependent, but let’s continue with the idea that some people may be innately more murderous than others, in general). You could imagine any or all of the following traits might be involved: impulsivity, mood stability, aggressiveness, threat sensitivity, punishment sensitivity, intelligence, executive function, vengefulness, agreeableness, jealousness, honesty, narcissism, psychopathy, empathy, moral reasoning, general meanness, sensitivity to alcohol or drugs, and on and on.
So, you could have one murderer with high threat sensitivity, impulsivity, and aggressiveness under the influence of alcohol, and another with high narcissism and psychopathy and deficient moral reasoning. If you took a hundred murderers, you might have a hundred different profiles. This rather undermines the design of the experiment from the get-go.
The design of the experiment
The next implicit assumption in the experimental design – that differences in such traits should be manifest in the size of various brain regions – is also not well justified. Why would we expect that to be the case? Is that how psychological traits are determined? By the relative size of bits of the brain?
This is just the modern, neuroimaging “blobology” version of phrenology. The reasoning behind this seems to be: brain region X is “involved in” Y (or, in even worse wording, “does” Y). Therefore, if brain region X is bigger, a person will be better at Y.
It’s 2019 – is this really how we understand the relationship between the complex functions of the mind and the neural substrates that carry them out? Just based on the real estate occupied by supposed functional modules? Can we actually map complex cognitive functions to little bits of the brain? And is bigger better?
This paper is not unusual in adopting this logic – it is a vague and unquestioned starting position for many similar studies. The literature is chock-full of reports of such correlations – that is, in fact, the main methodology of a lot of what can be called “psychology, with added neuroimaging”. But to the best of my knowledge, despite thousands of claims, there are few robust, well-replicated examples correlating the size of little bits of the brain with variation in psychological traits.
There are, in fact, all kinds of parameters that could vary that would affect the function or tuning of a circuit (involving many, distributed regions) that would not be visible by structural neuroimaging: variation in levels of neurotransmitter receptors, or distribution of specific types of interneurons, or altered density of dendritic spines, or differences in many other aspects of synaptic microarchitecture, or neurochemistry, or connectivity, etc. Size isn’t everything. In fact, we have little reason to think it’s anything.
So, the experimental design is not, in my view, well founded. As is quite common, the implicit assumptions underlying it go completely unexamined in the paper (and presumably unchallenged by the reviewers or the editor).
Now, what about the methodology and the findings? Clearly, they found something, otherwise we wouldn’t have heard about it. (This is not a trivial statement – the general existence of publication bias bears directly on how much weight we should put in the “positive” results, especially if a study was not pre-registered).
On the plus side, the sample is very large and has a good control group: 203 convicted murderers compared with 605 people convicted of other crimes, all from the same male prison population, all scanned under the same conditions on the same scanner (as far as I can tell). The control group was further broken down into violent and non-violent offenders.
The hypothesis being tested – or maybe we should say the idea being explored – is that the brains of murderers would show some structural differences to the brains of non-murderers. More particularly, the researchers stated:
We hypothesize that homicide offenders will have deficits in areas of executive functioning and limbic control areas within the prefrontal cortex and anterior temporal cortex compared to non-homicide offenders.
They didn’t actually specifically test that hypothesis, however. Instead, the analyses performed were highly exploratory, looking for effects in any direction, anywhere in the brain. To do that, the researchers use a method known as voxel-based morphometry to look for differences in size of bits of the brain. Basically, this takes the scan of an individual’s brain and warps it into a common template space to allow averaging and comparison across groups. The amount of warping that is required to get the individual’s scan into the template is recorded on a voxel-by-voxel basis and that is taken as an estimate of the amount of grey matter in that bit of the brain in that individual, relative to everyone else.
The next step is to average out those values in the common template for each group and compare them. That raises a statistical problem, given that you are performing tests on over a million 1mm3 voxels in the brain. Various statistical methods can be used to correct for these multiple comparisons. Here, the authors used the False Discovery Rate. In addition, a number of possible confounding factors that differed between the groups were included in the analysis of variance (ANOVA) model, as listed below. (PCLR refers to a test for psychopathy). The idea is that controlling for these factors in the ANOVA gives you confidence that any differences observed are not just driven by that factor, making it more likely that they are driven by the factor of interest – murder.
One way ANOVA was performed on a voxel-by-voxel basis over the whole brain using SPM12 to evaluate differences in regional gray matter volumes between Homicide (n = 203), Violent Non-Homicide (n = 475) and Minimally Violent (n =130) offenders, with all three groups included as factors in each analysis. The ANOVA model included each subject’s total brain volume (i.e., gray matter plus white matter), PCLR total scores, substance use severity, age at time of scan, IQ, and time in prison variables as covariates. Whole brain analyses using the False Discovery Rate for control over Type I error, were performed for all comparisons.
With that methodology in mind, let’s look at the findings.
The murdery bits
The authors found dozens of clusters of voxels (location information is given for 47) showing a difference in grey matter volume between murderers and non-murderers. All of the differences were a relative reduction in volume in the murderers. A comparison of murderers to the subset of violent non-homicide offenders gave largely similar results while comparisons between the control subsets, violent (non-homicide) and minimally violent offenders, “yielded mostly null results, and no results survived correction for multiple comparisons”.
So, what are we to make of these findings? The first question is: should we trust that they are “real” and not a statistical blip? The answer is very difficult to discern. The statistics have all been done in the standard way for this kind of study, but is that standard actually rigorous enough?
The typical approach is to let the software do the statistical jiggery-pokery, and if some difference comes out as significant, then it’s taken to be real. I find that a little unconvincing, and the reason is empirical: the literature is full of studies performing exactly these kinds of analyses on neuroimaging data – using FDR and controlling for all kinds of confounds – and claiming some significant differences between groups, only for them to fail to replicate in subsequent studies.
At some point, you have to ask if the stats are doing what we think they’re doing. There is, in fact, some debate about whether the FDR is an appropriate measure and how it should be implemented – by voxels, or by clusters of a certain size, for example. In addition, there is considerable doubt as to the possibility of effectively “controlling for” possible confounding variables that differ between the subgroups. Some claim that “control for” should really read: “attempt to adjust for using unrealistic linear assumptions”, while others argue it is flat out impossible to correct for confounds, using ANCOVA.
You would like your stats to give you an indication of how robust and generalizable a finding is – that isn’t actually what they do, but it is implicitly how they are (mis)interpreted. In this case, for example, the authors claim to have discovered something about “murderers”, that is, murderers in general, not just about the murderers in their sample.
But the best way to see if your findings are robust and generalizable is to directly test whether they replicate in a separate sample. For this kind of population, this is obviously a tall order, and it is understandable that the authors were not able to accomplish it. However, without such replication, and with the caveats about the statistical methodology, it is hard to know how much trust to place in the findings.
Interpreting the findings
Taking them at face value, the authors focus on prominent differences in areas of the cingulate cortex, insula, prefrontal cortex, and orbitofrontal cortex. These structures are implicated in many aspects of behavioral control that could be seen as relevant to “murderousness” (emotional regulation, impulse control, weighing possible negative consequences of an action, and others). However, differences also appear in many other parts of the brain, including for example, the somatosensory, auditory, and visual cortices, and the cerebellum, which are less obviously implicated in the kinds of cognitive tasks we might expect to be involved.
Prior work has also reported some differences between the brains of violent criminals or psychopaths and controls and the authors point to some overlap in the brain regions where such differences have been found, as well as some variation. This raises an interesting question as to how we should assess whether an imaging finding replicates. Should we expect the same cluster of voxels to differ consistently across different samples? (This may not even be comparable if the template space is defined from the subjects in each sample). If we just see some clusters differing in the same broad region, then, is that a replication? How are we defining a “region”? If any clusters in any regions “replicate” should we focus on those as consistent signal and ignore the others?
The tendency of course, when faced with a long list of findings, is to focus on the ones that make the most sense, according to your prior knowledge (also known as your preconceived biases). The same dynamic is seen in the analysis and discussion of results from genomics approaches, such as genome-wide association studies or transcriptomics. Faced with a list of hundreds of genes that their study has just “implicated”, researchers tend to pick a few favorites and tell a story about them, while ignoring the rest. The positive hits in your prior regions of interest can be taken as supporting its involvement, but is that really how we should update our hypotheses? By selectively attending to the evidence that confirms our priors, while ignoring evidence implicating other regions?
There is an additional problem inherent in VBM analyses, which is that human brains are quite variable. Not just in overall size or subtle variations in shape – people also show quite a bit of variation in the layout and shape and size of various functional areas, defined by which bits are active when we are doing various tasks or which bits tend to be talking to each other. Our brains are so unique, in fact, that this distribution of functional activations is referred to as a “neural fingerprint”.
So, when you do VBM, and you are warping voxels from some little bit of the brain in one individual into a common space, there is no guarantee that it belongs to a functionally homologous region in another individual. It’s more likely to than not, perhaps, and an average map can be made, but there will still be lots of idiosyncratic variability in the layout, which makes assigning function to regions in the template more challenging. Indeed, newer approaches are defining these functional regions on an individual basis first, before performing any kind of group averaging.
Finally, there is the issue of causality. This kind of observational study can only provide a correlation. Taking the findings at face value, a plausible interpretation is that the brain differences cause the murderousness. But it is certainly also conceivable that these differences arise as a consequence of traumatic and violent experiences, which people who eventually became murderers are likely to have gone through. Or maybe they’re due to some other confound that was not fully corrected for or not anticipated at all. Who knows? Maybe they’re a marker of guilt and remorse.
To sum up, we have the observation of a profile of differences across many regions between the brains of murderers and non-murderers in this sample. There are, I think, legitimate questions about the statistical robustness of the findings in the first instance. There is also a question as to whether, if they are “real” and not just statistical blips, they are really driven by the factor on which the groups were chosen (murder) and not by some known or unknown confounding variable. Finally, even if they are taken at face value, it is hard to know what the overall profile means or whether it really tells us anything about the (varied) psychology of murderers that we didn’t know before.
Research like this has real-world impacts, whether or not they are intended by the authors or warranted by the strength of the data. You can bet that reports of these findings will increase the use of brain scans as supposedly exculpatory evidence in murder trials. This practice is already happening, based on previous reports of a similar nature, and has been seen for genetic findings as well, despite the fact that the associations have been shown to be spurious. Both types of evidence have proven effective in getting sentences reduced, on the basis that the defendant is biologically predisposed to violence and thus cannot be held fully responsible for it.
But the flip side of this appeal to biological essentialism is that such a person may be deemed less likely to be rehabilitated and more prone to recidivism. Indeed, you could see prosecutors or parole boards using the same evidence to argue, on a different basis, for longer sentences. Estimating the likelihood of future criminality or violence is, of course, a normal part of such decisions – the question is whether a brain scan of an individual can actually give you any accurate or helpful information in that regard.
And the answer is no. Group average differences do not necessarily allow prediction of individuals and the question is untested in this study.
In fairness, the authors discuss this explicitly:
While this report demonstrates aggregate differences between homicide offenders and other violent offenders that are highly statistically significant, this should not be mistaken for the ability to identify individual homicide offenders using brain data alone, nor should this work be interpreted as predicting future homicidal behavior.
However, that caveat is quarantined to the final section of the paper on “Limitations and future directions”. Having such a section is absolutely standard practice and certainly a good one, as it is used to explicitly lay out limitations of the experimental design and alternative interpretations of the data. But the practice of corralling those concerns into one section, and presenting them as an afterthought, frees authors to blithely ignore them in the way they present and interpret their findings in the rest of the paper. If challenged, they can always point to the final section to show how rigorous and objective they have actually been and how up-front and circumspect about possible weaknesses of their claims.
If you were cynical, you might call it the “covering your ass” section. Having it there gives licence to make the boldest claims in all the other sections of the paper, especially the title and the abstract, and, crucially, the press release. In this case, the authors undermine their own caveat by ending the abstract with this claim:
This demonstrates, for the first time, that unique brain abnormalities may distinguish offenders who kill from other serious violent offenders and non-violent antisocial individuals.
They may argue that this sentence is intended to mean just that there are aggregate, group average differences between murderers and non-murderers in their sample. But a reasonable person is likely to read the word “distinguish” as implying that these “unique brain abnormalities” can, literally, be used to distinguish individual murderers from individual non-murderers. Similarly loose language is used in this tweet by Jean Decety, one of the senior authors of the paper:
“Our study, which includes 808 incarcerated males, demonstrates unique brain abnormality (in ventromedial prefrontal cortex and insula) that differentiate offenders who killed from other violent offenders”
That sounds pretty definitive. The differences are apparently unique to murderers (though this hasn’t been shown - indeed, the paper itself states that "the localized deficits in gray matter exhibited in this sample of homicide offenders are not necessarily specific to homicidal behavior") and also highly specific to just a couple brain regions (though the data also do not show that). If I’m a defense lawyer I may go looking for someone to scan my defendant’s brain and tell me they show evidence of this “unique brain abnormality”. If I’m on a parole board, I might be similarly interested. Indeed, why wait until someone has committed a murder to use such data to predict future crime? Why not get in there first and identify “high-risk” individuals?
This may seem far-fetched for brain scans, as it’s highly impractical, but just wait until genome-wide association studies find some hits for criminality and see how easy it will be to generate a polygenic score that supposedly predicts this trait for individuals.
Whether the authors intend it or not, this study feeds into a narrative of biological essentialism that conveniently lets us ignore all the messy social factors and complex individual experiences that may lead a person to commit murder. If we can track some objective indicator of a biological risk of this behavior and supposedly put a number on it, you can be sure that that number will be applied to individuals and used in all kinds of unexpected ways, whether or not it has any actual validity.