Now
I’ll be in New Orleans for NeurIPS and visiting Berkeley afterwards (till Christmas).
Things I’m currently working on (links go to project docs):
Training curve nonconvexities / phase changes :
train LMMs on a large generated dataset of math expressions; track loss/acc on subtasks; check if distinct phase transitions happen.
testing the hypothesis that “phase changes are everywhere”
Scalable interpretability (with Neel Alex)
train an “interp-net” to predict various attributes, given as input the trained model weights of a base network.
Convergent instrumental goals in LLM agents , aka demonstrating deceptive alignment:
Goal: demonstrate a level of situational awareness in “LLM agents” that is sufficient for a model to ‘play along’ with some training scheme in order to remain unmodified and thus better able to pursue goals later on.
A paper on ‘Speculative concerns in AI safety’
Grokking and double descent (with Xander Davies)
Last updated December 2022.
What is a now page?
About
Blog
Google Scholar
Code
Twitter
Now