Conor HoughtonOld news (2024-10-22) George wins funding for: Collecting and analysing multilingual EEG data George Sains is a doctoral teaching associate in my lab; this means he is partly a PhD student and partly involved in teaching, something like an American style PhD student with TA duties. He recently apply to the Jean Golding Institute for seedcorn funding to run an EEG study, unusually for a student with a computational background George has from the start wanted to run an experiment. We have just heard his application was successful. George has been working on the analysis of EEG data building on early work by Davide Turco: Bayesian modeling of language-evoked event-related potentials We have had some interesting results that tell a fascinating story about grammar and the brain. The next stage demands multi-lingual data: recording from speakers of different languages. To this end George will record EEG data from English and Chinese speakers. This will in collaboration with the Bristol Digital Futures Institute: www.bristol.ac.uk/bristol-digital-futures-institute/ The award will pay experimental costs and participant fees and also includes some help from the Jean Golding Institute data scientists to improve our data workflow, which is very cool. We are excited to see what George finds! (2024-09-24) A paper about evolution!
Cooperation as well as learning: A commentary on 'How learning can guide evolution' by Hinton and Nowlan According to the Baldwin Effect learning can guide evolution: being able to learn a trait can help a species evolve that same trait. This at first feels like Lemarkian nonsense, but it isn't. Lets consider a very artificial example; imagine it is useful for a hen to be able recognize a snake and then stand up tall and dinosaur-like to scare it off. Imagine further, and implausibly, that each of these two traits can be produced by a single mutation. Every so often a hen is hatched that can recognize a snake. This trait does it no good since it doesn't know how to scare off that same snake. Similarly, imagine on other occasions a hen is hatched that can scare off snakes; again this trait is useless, it is no good knowing how to scare of a snake if you don't know how to spot one. Sadly it is much much rarer that a hen is hatched that has both traits at once and so the useless individual traits disappear from the population and evolutionary change does not produce snake-safe hens.. Now imagine that a hen can also learn, though experience, to scare off snakes, learning the required tall stance after a few close calls. In this case, inheriting the mutation that allows a hen to recognize a snake is useful, when it recognizes the snake and with a few frightening encounters and a little luck, it can learn to scare snakes off. Thus the potential to learn how to scare snakes makes the recognizing-snakes mutation useful and so this mutation will provide fitness and in the usual Darwinian way spread through the population. Furthermore, once the hens can recognize snakes, the mutation that makes them instinctively stand tall to scare snakes becomes useful too, it saves them the risky encounters required to learn the trait by experince. In this way, the fact they can learn the traits makes the species more likely to evolve this. This is the Baldwin Effect. In 1987 Hinton and Nowlan wrote a very elegant paper describing the Baldwin Effect in a clear way and illustrating it with a nice mathematical simulation. How learning can guide evolution This paper was influential, for example, it formed part of the argument in Pinker and Bloom's powerful argument that language evolved through normal Darwinian mechanisms: Natural language and natural selection In my paper I point out that there is a similar effect with cooperation. Perhaps in a flock of hens one hen gets the mutation that allows it to recognize a snake and when it sees one it squawks in alarm, another hen has the mutation that makes standing tall when under threat instinctive and so it scares away the snake, for the benefit of all. Both traits are benefitial and the usual Darwinian, survivalist, principles mean that they will become established. The two mutations are much more likely to occur in the same time in flock than in one individual animal. My paper uses the same sort of simulations described by Hinton and Nowlan to illustrate this effect. Thus, in addition to the obvious benefits of social behaviour in animals, cooperation broadens the evolutionary path to complex behaviours. (2024-09-09) New preprint: Residual stream analysis with multi-layer SAEs I am excited and lucky to be involved in this work which is almost totally due to my collaborators. Like everyone else we are interested in how transformers work and, in particular, in how they represent and manipulate the features of language. It turns out that sparse autoencoders are a useful way to find out what these features are. So far this approach has used a different autoencoder on each layer of a transformer. However, the residual stream is often thought of a sort of scratch pad, a representation of the input which gets acted on and updated across successive processing steps. If this is true, then it should be possible to train a single autoencoder and see the same feature crop up, sometimes in one layer, sometimes in another. This is what we did and sort of what we saw, while there seem to be some features that are layer specific, some occur in different layers. Our preprint has graphs to show this and, more importantly, now we know the approach works, we can look to understanding what language looks like to a transformer! (2024-09-06) Poster from CCS24 doi.org/10.5281/zenodo.13709968 about my attempt to find the simplest possible model of language evolution, one that includes only our wish to communicate, our propensity towards playful innovation in language and our inclination to communicate only with those whose language resembles our own. The result is a sort of Ising model. I amn't sure how useful this model is, it contains nothing about the mechanics of language and so perhaps it is too abstract to tell us anything useful. However one thing I did find while working on the simulations and wondering how to compare it to real data is that the languages of the world, by population, satisfy a LogNormal. (2024-09-05) I was at the super COMPILA2024 workshop at CCS2024 this week; I really enjoyed the workshop, it was my first time meeting other people keen to see what modelling can tell us about language change. My talk was about new Iterated Learning Model I'm proposing with Seth Bullock and Jack Bunyan. I made a recording of the talk: youtube with slides at doi.org/10.5281/zenodo.13692349 In the talk, more than the paper, I'm trying to sell the idea that the model suggests something exciting: that the key to language evolution and to the use of language specifically by humans, is the use humans make of our utterances as a component of our thought. The idea is
The paper is here: arxiv: 2405.20818 (2024-08-20) My student Davide Turco has made a great poster for our Conference on Cognitive Computational Neuroscience paper "Investigating the timescales of language processing with EEG and language models": zenodo.org The paper is (2024-08-15): A new paper under review describing a hierarchical Bayesian workflow for analysing cell count data: These days experimentalists can mark neurons, slice up the brain and then count the marked cells. To use the two examples in our paper, this might mean marking all the cells that are active during some behaviour or all the cells with a particular developmental lineage. For each animal the number of marked cells is counted for lots of different brain region, in some experiments as many as a hundred. These data are super cool, they give information across the whole brain. They are, however, very time-consuming and expensive to collect and often there are only ten animals for each experimental condition. Now, the whole cool thing about the data is the high dimension, there are cell counts for each brain region, but this combination of a high dimension and a small sample means the data are under-sampled. Clearly Bayesian analysis can help with analysis, but for a Bayesian analysis you need to decide on a model and a set of priors. New samplers means Bayesian approaches can be used for data like this, but setting up the analysis can be intimating. Bayesian methods predate t-tests and the like, but in the last century they were not as well used as the classic mixture of hypothesis tests. As such there is not a lot of lore and tradition about what choices to make when it comes to doing a Bayesian analysis. In our paper we try to help by suggesting a 'standard' Bayesian workflow for cell count data. We test our workflow on two datasets and in both cases it works really well. It produces clearer results than a more classical approach. Bayesian models are less familiar but they are actually very transparent and clear. |