Schema formation in synaesthesia

The following is an extract (just the text, not the figures) from a paper I wrote for the proceedings of the V International Conference Synesthesia: Science and Art. Alcalà la Real de Jaén. España. 16–19th May 2015. 

Many of the ideas were also developed in a paper with my colleague Fiona Newell, on Multisensory Integration and Cross-Modal Learning in Synaesthesia: a Unifying Model.


Psychologists use the term “schema” to refer to the information or knowledge that makes up our concept of an object. It includes all the attributes that the object has, such as the shapes of a letter and the sounds it can make, the shape of a numeral and the value it represents, or the face of a person and their name and everything you know about them. In many cases, those attributes are represented across very different brain areas (such as those conveying visual or auditory information, for example). With experience, the representations of the different attributes of an object become linked together, in the mind, by repeated co-occurrence.

At the level of the brain, this must involve some kind of strengthening of connections across areas of the cerebral cortex so that a pattern of activity in one area (say that induced by the sight of the letter “A”) reliably co-activates or primes a particular pattern of activity in another area (say the sounds of the letter “A”). The brain is wired to enable this kind of communication between different areas, so that these sorts of associations can be learned by repeated exposure to contingent stimuli, for example as we learn the alphabet or our numbers.

Synaesthesia is characterised by the incorporation of additional attributes into the schema of an object – ones not reflecting the characteristics of the object itself but some internal associations triggered by it. I propose a model to account for synaesthesia based on innate differences in wiring between cortical areas, which lead to additional percepts (such as colours) being triggered by an object during learning. These repeated patterns eventually result in stable synaesthetic associations, despite the lack of reinforcement from the external world. This model can account for the heritability of the condition and evidence of cortical hyperconnectivity, but also for the learned nature of many of the inducing stimuli and observed trends in letter-colour or word-taste pairings.

In this conceptual framework, a natural contrast emerges between synaesthesia, on the one hand, and a number of other conditions, collectively known as “agnosias”, on the other. These include dyslexia and dyscalculia, face blindness, tone deafness, colour agnosia and others. These conditions seem to reflect an inability to incorporate all the attributes of an object into a schema, resulting in a “lack of knowledge” of particular types of objects. There is evidence that these result from decreased connectivity between brain regions. It is hoped that the study of synaesthesia may also inform on the mechanisms underlying these less benign conditions.


Synaesthesia is often described as a cross-sensory phenomenon, where, for example, particular sounds (such as words or musical notes) will induce a secondary percept (such as a color or taste), which is specific for each stimulus (Bargary and Mitchell, 2008; Hubbard and Ramachandran, 2005).  While these florid types of synaesthesia involve very vivid perceptual experiences, the more common manifestation is associative (Simner, 2012).  These cases involve the certain knowledge that some object, such as a letter or number, has, in addition to its normal attributes (shape, sound, value, etc.), some extra traits associated with it, such as spatial position, color, texture, even gender and personality.  These associated characteristics are stable, idiosyncratic and have typically formed an intrinsic part of the person’s schema of that object for as long as they can remember.   

A recurrent question in relation to synaesthesia is whether it represents a truly distinct phenomenon, qualitatively different from typical perception, or reflects instead an amplification or exaggeration of normal processes of multisensory integration (Deroy and Spence, 2013b; Ward et al., 2006). A related question concerns the extent to which particular synaesthetic associations arise arbitrarily through intrinsic neural mechanisms or are driven instead by experience and learning in ways that may be common to all people (Watson et al., 2014). Both these questions bear on what is arguably the central question in the field: why do some people develop synaesthesia while most do not? The answers to these questions thus determine fundamentally how we conceive of synaesthesia.

On the one hand, synaesthesia represents a dichotomous phenotype – people are relatively easily categorised as synaesthetes or non-synaesthetes. Moreover, the condition is clearly genetic in origin, often running in families with a Mendelian pattern of inheritance (some members clearly having the condition, others clearly not) (Asher et al., 2009; Barnett et al., 2008; Baron-Cohen et al., 1996; Galton, 1883; Rich et al., 2005; Ward and Simner, 2005). The primary answer to the question of why some people develop synaesthesia is therefore that they inherit a genetic variant that strongly predisposes to the condition. This argues for some intrinsic difference as a necessary starting point in explaining the condition and against a model where general processes are sufficient to explain it.

On the other hand, a number of lines of evidence suggest that whatever is happening in synaesthesia, it relies on or at least interacts with processes of multisensory integration that are common across all people. These include both acute cross-sensory activation as well as longer-term cross-modal learning.

First, there is strong evidence that most areas of what has been deemed unisensory cortex are in fact essentially multisensory, with extensive anatomical cross-connectivity and at least some modulatory inputs from other modalities providing credible substrates for cross-sensory interactions (Ghazanfar and Schroeder, 2006; Qin and Yu, 2013). The idea that such interactions may be always present but not always consciously accessible is reinforced by the activation of visual areas in blind or blindfolded people (Bavelier and Neville, 2002) and by the phantasmagoric audiovisual synaesthetic experiences associated with certain hallucinogens, such as lysergic acid (LSD), psilocybin or mescaline (Schmid et al., 2014; Sinke et al., 2012). Such experiences indicate that the barriers between the senses are certainly not as rigid as typical experience suggests. 

However, drug-induced synaesthetic experiences have quite a different phenomenology, tending to involve florid, detailed and complex visual experiences induced by sound, especially music (Deroy and Spence, 2013b; Sinke et al., 2012). By contrast, developmental synaesthesia is characterised by more stable and sedate cross-sensory pairings of particular stimuli with particular additional percepts or conceptual attributes. These tend to be quite simple in nature, involving perceptual primitives rather than complex forms. The relevance of drug-induced synaesthesia to the mechanisms underlying developmental forms thus remains unproven.

There is another line of evidence, however, which supports the idea that normal multisensory integration processes are involved in synaesthesia. In particular, these are processes involved in categorical perception, which integrate information about objects across sensory domains and generate conceptual and supramodal representations.

For any form of synaesthesia, the particular pairings that emerge between inducers and concurrents are idiosyncratic and tend to be dominated by apparent arbitrariness in any individual. However, by looking across many synaesthetes, it is possible to discern clear trends in such pairings, for example between particular letters and their synaesthetic colours. In English speakers, the letter B may be more commonly blue than other colours (perhaps 30% of the time) and the letter Y more commonly yellow (as high as 50% of the time) (Barnett et al., 2008; Rich et al., 2005). It is even apparent that, for some synaesthetes, all of their colour-letter pairings are derived from experience with childhood toys, such as refrigerator magnets (Witthoft and Winawer, 2006). Similarly, for many synaesthetes with number forms, the numbers 1 to 12 are arranged in a circle like a clock face (Galton, 1883). Many word-taste pairings can also be explained by semantic associations, such as “Cincinnati” tasting of cinnamon and “Barbara” tasting of rhubarb (Simner, 2007).

There are thus clear cultural and semantic influences on the particular pairings that emerge in developmental synaesthesia. Some theorists have argued that such trends demonstrate that synaesthesia is induced by learning and, even further, that the purpose of synaesthesia is to aid in learning the inducing categories (Asano and Yokosawa, 2013; Mroczko-Wasowicz and Nikolic, 2014; Watson et al., 2012; Yon and Press, 2014).

How can the idea that synaesthesia reflects innate, genetic differences be reconciled with models that suggest it is driven by learning? Here I develop a theoretical framework showing that these two models are quite compatible (previously sketched out in (Mitchell, 2013)). I argue: (i) that the predisposition to develop synaesthesia at all is genetic and innate; (ii) that the particular form and the pairings that emerge are driven largely by idiosyncratic connectivity differences; but (iii) that because the processes through which such pairings consolidate over time involve normal mechanisms of multisensory learning and categorical perception, the outcome can also be influenced by experience.

Synaesthesia as an associative phenomenon

Developmental synaesthesia can be present in diverse forms and experienced in qualitatively distinct ways. One important distinction is between synaesthetes who are “projectors” and those who are “associators”. For the former, the concurrent percept is actively and vividly perceived, either out in the world or “in the mind’s eye”, while for the latter it is merely conceptually activated in the way that saying the word “banana” activates the concept of yellow, possibly even prompting a visual image of the object in that colour, but is unlikely to induce a veridical percept of yellow out in the world.

There is another important phenomenological distinction between lower-level, truly cross-sensory synaesthesia, and higher-level, more conceptual forms. In the former, taking coloured hearing as an example, any sound may induce a visual percept, whether the person has ever heard it before or not. The same sound will tend to induce the same visual percept, but this does not seem to require prior experience.

For many associative forms, however, the synaesthetic associations arise only with a particular set of stimuli. Crucially, these are almost exclusively stimuli that are (i) categorical, and (ii) learned (often over-learned), such as letters, numbers, days of the week, months of the year, musical notes, words, etc. The emergence of these associative forms of synaesthesia must thus necessarily involve learning at some level, and indeed clearly interacts with normal processes through which the multisensory attributes of objects are learned.

Cross-modal learning and categorical perception we develop perceptual expertise, we come to categorise objects into types and to recognise particular instances as tokens of such types (Binder and Desai, 2011; Kourtzi and Connor, 2011). Moreover, for any particular object, we develop a conceptual framework that incorporates its many properties – a schema linking its various attributes. Thus, the concept of a banana includes its typical shape, colour, taste, and other semantic associations. While such representations incorporate attributes from multiple sensory domains, they are essentially conceptual and supramodal. There is good evidence that such supramodal representations involve activity in specific associative areas of the brain, located in anterior inferotemporal cortex (AIT), which can be thought of as “knowledge areas” (Chiou et al., 2014; Kourtzi and Connor, 2011).

Even for recent inventions like written alphabets, these areas tend to develop in the same positions across people (Dehaene and Cohen, 2007). This suggests that the specialisation of cortical areas for particular classes of objects relies on an evolutionarily programmed pattern of connectivity that places them at a convergence point of multiple, parallel hierarchies, enabling them to integrate information across the relevant sensory modalities.

Such areas are differentially activated by tasks that tap into semantic knowledge and lesions or temporary inactivation of these areas can result in object agnosias (De Renzi, 2000) – the inability to access all the attributes of an object when the concept is activated (Chiou et al., 2014; Kourtzi and Connor, 2011; Mitchell, 2011). For example, prosopagnosia, or face blindness, represents an inability to recognise people’s identity from their faces, though other aspects of face processing may remain intact. Similarly, colour agnosia refers to the inability to link characteristic colours into the schemas of objects, despite normal colour perception and discrimination.

Both these conditions can be caused by injuries to associative areas, but, intriguingly, both can also be innate and inherited (Behrmann and Avidan, 2005; Duchaine et al., 2007; Nijboer et al., 2007; van Zandvoort et al., 2007). The fact that such conditions can be genetic highlights the fact that differences in brain wiring (either structural or functional) can impact, very selectively, on higher-order conceptual processes, presumably by altering the anatomical substrates through which perceptual information is processed (Mitchell, 2011). Indeed, structural and functional neuroimaging studies have highlighted reduced connectivity within extended networks of cortical areas in these conditions. By contrast, synaesthesia may be caused by hyperconnectivity, linking additional areas into the conceptual schemas of learned categories of objects (Mitchell, 2011).

The processes by which such schemas emerge can be illustrated by considering how letters are learned. As children are learning to read they must learn to recognise and distinguish the various graphemes of the alphabet and also link them to the appropriate phonemes of their native language (Blomert and Froyen, 2010). In the visual domain, recognising graphemes requires, firstly, extraction of increasingly complex visual features across the hierarchy of areas in the ventral visual stream (Kravitz et al., 2013). Though it is a simplification, it is roughly true that each level in the visual hierarchy extracts more complex features by integrating inputs from multiple neurons at the level below, eventually enabling representation of shapes and objects across the visual field.

Areas that are higher still become specialised for processing specific types of visual information that correspond to various categories (letters, faces, objects, scenes). This kind of categorical perception enables the recognition of various instances of an object – different sizes, views or versions (such as of the letter “A” (A, A, A, a)) – all of which can activate the representation of the concept of the object (Kravitz et al., 2013). Similar processes arise as children learn spoken language – they develop expertise in recognising the typical speech sounds of their native language, but become deficient in distinguishing between uncommonly used phonemes.

Neuronal networks in general can learn in the following fashion. Any given stimulus will activate a distinct subset of neurons across an area of cortex, which can be thought of, reasonably accurately, as a two-dimensional sheet of highly interconnected cells. Due to their coincident activation, the connections between these neurons will be slightly strengthened (Hebb, 1949). If a particular stimulus is seen over and over again, this subset of neurons will become a functional unit, primed to respond en masse to similar stimuli. (More realistically, the properties of the stimulus may be represented not by one static pattern but by the dynamic trajectory of firing patterns across some time period (Daelli and Treves, 2010)).

The patterns of neuronal activity that represent any one object can be thought of as attractor states – any stimulus that occupies a nearby spot in perceptual space (e.g., that has a similar shape or sound) will be “pulled into the attractor”, with the network state ultimately converging on a pattern that represents that object. This “perceptual magnet” effect can be observed in psychophysical results, which show that discrimination between stimuli that fall within a boundary is less than that between two stimuli that are equally distant in stimulus parameters but that span a categorical boundary (Daelli and Treves, 2010). It is important to note that this perceptual categorisation, especially of ambiguous stimuli, is also sensitive to context and top-down influences (Feldman et al., 2009)  – indeed, all perception involves the comparison of bottom-up signals with top-down expectations, or prior probabilities, so as to allow active inference of the objects in the world that are responsible for the pattern of sensory stimulation (Friston, 2010; Gilbert and Li, 2013).

Linking the visual and auditory attributes of an object requires yet a higher level of integration and abstraction (Kourtzi and Connor, 2011). It is driven by the statistical regularities of experience – when the letter A is seen, it is typically accompanied by the sound ā, as in hay, or ă, as in cat. The representations of these shapes and sounds are thus reliably contingent and can lead to the development of a higher-level representation, incorporating both elements. The brain thus builds up cognitive representations of letters as audiovisual objects with characteristic attributes, despite the fact that these are essentially arbitrary pairings between symbols and sounds, determined purely by convention (Blomert and Froyen, 2010).

These processes reinforce each other. Though spoken language is learned much earlier and much more easily than written language, learning to read nevertheless increases the ability to distinguish between phonemes (which are not the natural basic units of speech) (Blomert and Froyen, 2010; Dehaene et al., 2010). Conversely, phonetic representations are involved in strengthening categorical representations of graphemes (Brem et al., 2010). The process of forming these associations is quite protracted, taking years to reach a level of perceptual expertise that is effectively automatic, as demonstrated by cross-modal mismatch negativity signals (Blomert, 2010). The development of such automaticity is lacking in dyslexia, presumably contributing to the fact that reading remains effortful despite extensive training.

In this model of schema formation, any areas that are reliably co-activated will be incorporated into the schema of an object, as supramodal areas monitor patterns of co-activation across many lower areas and represent the statistical regularities of such contingencies. This leads to a model of synaesthesia whereby innate differences in brain wiring produce internally generated percepts, which, over time, are incorporated through the normal processes of cross-modal learning into the schemas of the inducing objects.

A model unifying innate differences with learning processes

This model of associative synaesthesia requires in the first place some intrinsic cross-activation of additional areas not normally activated by particular objects (an innate difference between synaesthetes and non-synaesthetes). For letters, this might be an area representing colour, for example (but the idea can be readily extended to other forms). If such an area is topographically interconnected with say the grapheme area, so that nearby neurons in the first area project to nearby neurons in the second area (Bargary and Mitchell, 2008), then activation of the pattern of neuronal firing that represents any particular letter will necessarily cross-activate some (arbitrary) pattern of neuronal firing in the colour area (Brouwer and Heeger, 2009; Li et al., 2014).

Given that the colour areas mature much earlier than the grapheme area (Batardiere et al., 2002; Bourne and Rosa, 2006; Dehaene and Cohen, 2007), such patterns will likely evolve towards a set of attractor states that already represent specific colours (Brouwer and Heeger, 2009; Li et al., 2014). This may or may not lead to a conscious and vivid percept of colour, but should at least lead to the activation of the concept of a colour. Over time, with extensive repetition, this internally generated sensory property will come to be incorporated into the schema of that letter, becoming as much a part of the concept of the letter as its shape(s) and sound(s).

In this model, without any other influences, the particular pairings that emerge would be expected to be largely arbitrary – dependent on the particular cross-connectivity at the anatomical level. Such a model could explain observed second-order trends, whereby similarly shaped letters tend to have similar colours within individual synaesthetes, even though these colours differ across synaesthetes (Watson et al., 2012). The letters E and F, for example, are often similarly coloured, which would be expected from an arbitrary topographic mapping between areas representing shape space and colour space (Brang et al., 2011). 

In addition, one can imagine how the pairings might be influenced by characteristics that affect the types of neural patterns representing specific letters or colours (Chiou and Rich, 2014; Deroy and Spence, 2013a; Ward et al., 2006; Watson et al., 2012). For example, there could be some correspondences between the states representing higher frequency letters and those representing higher intensity colours (such as the proportion of neurons within the area that are recruited to the representational pattern or some dynamical property of the pattern), which would make pairings between members of those two types more likely to emerge. While speculative, this kind of scenario may provide an explanation of cross-sensory trends that do not rely on semantic information but reflect some currently unknown representational parameters that hold across modalities.

As with the pairings between graphemes and phonemes that emerge during learning to read, synaesthetic pairings also take a long to coalesce. Simner and colleagues found in a longitudinal study of children an increase in both the number and stability of synaesthetic correspondences between letters and colours at age 7/8 compared to age 6/7 (Simner et al., 2009) and more again at age 10/11 (Simner and Bain, 2013).

Importantly for this model, this protracted period of consolidation leaves the opportunity for semantic associations to influence the ultimate outcome, in the same way that they do when learning the attributes of any object. Such influences could explain the observed trends in specific pairings mentioned above. For example, when a synaesthete is learning the letter B, and their colour area is being cross-activated in some arbitrary pattern, the semantic relationship between B and “blue” may prime the neuronal pattern representing blue, making it more likely for the network to move toward that attractor state, which will in turn be reinforced by each such co-activation. This can be interpreted in a Bayesian context as top-down signals conveying a prior probability of “blue”, in the context of the letter B, which, to a greater or lesser extent across individuals, will tend to over-ride the bottom-up sensory information, biasing the incorporation of a consistent association with this colour into the emerging schema of the letter. It is not difficult to see how such semantic influences could act in other kinds of synaesthesia – for example, if the numbers 1 to 12 are regularly seen in clock face arrangement, that may influence the spatial pattern that emerges in an individual’s number line. (It will be interesting to see whether this trend changes as clock faces become rarer).

In this way, the particular synaesthetic associations that emerge through this protracted process can be biased by top-down semantic processes, without being entirely determined by them, reflecting observed trends rather than rules of associations. Fundamentally, this means that while such semantic processes may affect the outcome, they are not the prime drivers of the phenomenon of synaesthesia. The condition of synaesthesia is thus not caused by learning, and there is certainly no reason to think of it as having a purpose in facilitating learning. It may have that effect, but it is a conceptual mistake to interpret that as a reason for its existence.

The difference that makes a difference, in determining why some people get synaesthesia and others do not, is genetic (clearly so in many cases and likely so in others). The model of cross-activation at some level of the perceptual hierarchy remains a parsimonious one for the mechanism through which these genetic differences mediate their primary effects (Bargary and Mitchell, 2008; Hubbard et al., 2011), and has at least general support from many neuroimaging studies (Rouw et al., 2011). Whether this involves primary changes in structural or functional connectivity remains an open question, but in either case, the outcome is an altered sensory experience, with some internally generated percept that gets assimilated into the schemas of the inducing objects through normal processes of multisensory learning.

This model is subtly but importantly different from ones that explain synaesthesia on the basis of only an acute difference between the brains of synaesthetes and non-synaesthetes. For some synaesthetes, such as audiovisual synaesthetes for whom new sounds induce a vivid visual percept, there may – perhaps must – be some ongoing sensory cross-talk. But for associators, their synaesthetic experiences may arise not because their brain wiring is slightly different, but because that difference existed over development, while the person was learning various categories of objects. An interesting possibility that may link the phenomenologically distinct forms of projector and associator synaesthesia is that an early state of vivid synaesthesia, which may fade in conscious experience over time (as reported at least anecdotally by some adults), could nevertheless lead to long-lasting conceptual associations if it were present during this learning period.

The theoretical framework presented here is consistent with both an innate difference as the fundamental driver of the condition of synaesthesia, and with semantic and experiential influences on the eventual phenotype that emerges. In particular, it proposes that the internally generated synaesthetic percepts are treated similar to other sensory information as the brain is learning the sensory attributes of objects and developing schemas to conceptually link them.


De Renzi, M. (2000). Disorders of visual recognition. Semin Neurol 20, 479-485.


  1. This is great information explaining how the brain is working. Are there any researchers in the practical application for teachers and parents? I have yet to find anyone that can help with my 13 -year-old.


Post a Comment

Popular posts from this blog

Undetermined - a response to Robert Sapolsky. Part 1 - a tale of two neuroscientists

Grandma’s trauma – a critical appraisal of the evidence for transgenerational epigenetic inheritance in humans

Undetermined - a response to Robert Sapolsky. Part 2 - assessing the scientific evidence