Conor Houghton
     

Conor Houghton

Conor Houghton

Contact

conor.houghton@bristol.ac.uk

School of Engineering Mathematics and Technology
University of Bristol
Michael Ventris Building
Woodland Road
Bristol
BS8 1UB
England

official home page.


Papers


The latest news

(2024-09-09) New preprint:

arXiv: 2409.04185

Residual stream analysis with multi-layer SAEs
Tim Lawson, Lucy Farnik, Conor Houghton, Laurence Aitchison

I am excited and lucky to be involved in this work which is almost totally due to my collaborators.

Like everyone else we are interested in how transformers work and, in particular, in how they represent and manipulate the features of language. It turns out that sparse autoencoders are a useful way to find out what these features are. So far this approach has used a different autoencoder on each layer of a transformer. However, the residual stream is often thought of a sort of scratch pad, a representation of the input which gets acted on and updated across successive processing steps. If this is true, then it should be possible to train a single autoencoder and see the same feature crop up, sometimes in one layer, sometimes in another. This is what we did and sort of what we saw, while there seem to be some features that are layer specific, some occur in different layers. Our preprint has graphs to show this and, more importantly, now we know the approach works, we can look to understanding what language looks like to a transformer!


Old news