Conor Houghton
     

Conor Houghton


Papers

Old news

(2024-09-09) New preprint:

arXiv: 2409.04185

Residual stream analysis with multi-layer SAEs
Tim Lawson, Lucy Farnik, Conor Houghton, Laurence Aitchison

I am excited and lucky to be involved in this work which is almost totally due to my collaborators.

Like everyone else we are interested in how transformers work and, in particular, in how they represent and manipulate the features of language. It turns out that sparse autoencoders are a useful way to find out what these features are. So far this approach has used a different autoencoder on each layer of a transformer. However, the residual stream is often thought of a sort of scratch pad, a representation of the input which gets acted on and updated across successive processing steps. If this is true, then it should be possible to train a single autoencoder and see the same feature crop up, sometimes in one layer, sometimes in another. This is what we did and sort of what we saw, while there seem to be some features that are layer specific, some occur in different layers. Our preprint has graphs to show this and, more importantly, now we know the approach works, we can look to understanding what language looks like to a transformer!

(2024-09-06) Poster from CCS24

doi.org/10.5281/zenodo.13709968

about my attempt to find the simplest possible model of language evolution, one that includes only our wish to communicate, our propensity towards playful innovation in language and our inclination to communicate only with those whose language resembles our own. The result is a sort of Ising model.

I amn't sure how useful this model is, it contains nothing about the mechanics of language and so perhaps it is too abstract to tell us anything useful. However one thing I did find while working on the simulations and wondering how to compare it to real data is that the languages of the world, by population, satisfy a LogNormal.

(2024-09-05) I was at the super COMPILA2024 workshop at CCS2024 this week; I really enjoyed the workshop, it was my first time meeting other people keen to see what modelling can tell us about language change. My talk was about new Iterated Learning Model I'm proposing with Seth Bullock and Jack Bunyan. I made a recording of the talk:

youtube with slides at doi.org/10.5281/zenodo.13692349

In the talk, more than the paper, I'm trying to sell the idea that the model suggests something exciting: that the key to language evolution and to the use of language specifically by humans, is the use humans make of our utterances as a component of our thought. The idea is

  • simple animals: stimulus → action
  • more complex animals: stimulus → thought → action
  • social animals: stimulus → thought → action or utterance
  • our ancestors: stimulus → (thought ↔ internal utterances) → action or utterance
  • humans: stimulus → (thought ↔ internal language) → action or language

The paper is here: arxiv: 2405.20818

(2024-08-20) My student Davide Turco has made a great poster for our Conference on Cognitive Computational Neuroscience paper "Investigating the timescales of language processing with EEG and language models": zenodo.org

The paper is
Investigating the timescales of language processing with EEG and language models.
Davide Turco and Conor Houghton
Conference on Cognitive Computational Neuroscience (CCN 2024)
arxiv: 2406.19884

(2024-08-15): A new paper under review describing a hierarchical Bayesian workflow for analysing cell count data:
Hierarchical Bayesian modeling of multi-region brain cell count data.
Sydney Dimmock, Benjamin M.S. Exley, Gerald Moore, Lucy Menage, Alessio Delogu, Simon R Schultz, E Clea Warburton, Conor J Houghton and Cian O'Donnell
bioRxiv, bioRxiv doi: 10.1101/2024.07.20.603979

These days experimentalists can mark neurons, slice up the brain and then count the marked cells. To use the two examples in our paper, this might mean marking all the cells that are active during some behaviour or all the cells with a particular developmental lineage. For each animal the number of marked cells is counted for lots of different brain region, in some experiments as many as a hundred.

These data are super cool, they give information across the whole brain. They are, however, very time-consuming and expensive to collect and often there are only ten animals for each experimental condition. Now, the whole cool thing about the data is the high dimension, there are cell counts for each brain region, but this combination of a high dimension and a small sample means the data are under-sampled.

Clearly Bayesian analysis can help with analysis, but for a Bayesian analysis you need to decide on a model and a set of priors. New samplers means Bayesian approaches can be used for data like this, but setting up the analysis can be intimating. Bayesian methods predate t-tests and the like, but in the last century they were not as well used as the classic mixture of hypothesis tests. As such there is not a lot of lore and tradition about what choices to make when it comes to doing a Bayesian analysis.

In our paper we try to help by suggesting a 'standard' Bayesian workflow for cell count data. We test our workflow on two datasets and in both cases it works really well. It produces clearer results than a more classical approach. Bayesian models are less familiar but they are actually very transparent and clear.

Back to main page