I'm seeing a lot of people confused about this - asking: what exactly is the problem here? That's a great question! Let's use this as a learning opportunity and dig in. đź§µ
First, I've seen that one of the most common responses is that anyone criticising the original post clearly doesn't understand it and is ignorant of how language models work. Aidan Gomez is an author of the Transformers paper, and is CEO of Cohere. I think he understands fine.
So why haven't we seen clear explanations of why "checking for sudden drops in the loss function and suspending training" comment is so ludicrous? Well, the problem is that it's such a bizarre idea that it's not even wrong. It's nonsensical. Which makes it hard to refute.
To understand why, we need to understand how these models work. There are two key steps: training, and inference. Training a model involves calculating the derivatives of the loss function with respect to the weights, and using those to update the weights to decrease loss.
Inference involves taking a model that has gone through the above process (called "back propagation") many times and then calculating activations from that trained model using new data.
Neither training nor inference can have any immediate impact on the world. They are simply calculating the parameters of a mathematical function, or using those parameters to calculate the result of a function.
Therefore, we don't need to check for sudden drops in the loss function and suspend training, because the training process has no immediate impact on the outside world.
The only time that a model can impact anything is when it's *deployed* - that is, it's made available to people or directly to external systems, being provided data, making calculations, and then those results being used in some way.
So in practice, the way that models are *always* deployed is that after training, they are tested, to see how they operate on new data, and how their outputs work when used in some process.
Now of course, if we'd seen during training that our new model has much lower loss than we've seen before, whilst we wouldn't "suspend training", we would of course check the model's practical performance extra carefully. After all, maybe it was a bug? Or maybe it's more capable?
But saying "we should test our trained models before deploying them" is telling no-one anything new whatsoever. We all know that, and we all do that. Figuring out better ways to test models before deployment is an active and rich research area.
OTOH, "check for sudden drops in the loss function and suspend training" sounds much more exciting. Problem is, it's not connected with the real world at all.
Some folks have pointed out that "drops in the loss function" is a pretty odd way to phrase things. It's actually just "drops in the loss". An AI researcher saying "drops in the loss function" is a bit like a banker saying "ATM machine" - maybe a slip, maybe incompetence.
PS: please don't respond to this thread with "OK the exact words don't make sense, but if we wave our hands we can imagine he really meant some different set of words that if we squint kinda do make sense". I don't know why some folks respond like this *every* *single* *time*.
PPS: None of this is to make any claim as to the urgency or importance of working on AI alignment. However, if you believe AI alignment is important work, I hope you'll agree that it's worth discussing with intellectual rigor and with a firm grounding of basic principals.
*principles
@jeremyphoward To my previous statements, I suppose I can add the further point that - while, yes, stuff could be deadlier at inference time, especially if the modern chain-of-thought paradigm lasts - anyone with any security mindset would check training too. https://x.com/ESYudkowsky/stat...
@jeremyphoward Just for the record, this is an extraordinarily bad/weak critique, and I expect that within a few months people who have very strong classical ML credentials will also disavow this recent set of critiques.
@jeremyphoward You're not wrong Jeremy, but technically EY isn't either. It's possible (unlikely), but _possible_ a super-AGI emerges during training, realizes it is in a training loop, and optimizes for affecting some kind of goal outside of it (ie real world). @robertskmiles has some great
@jeremyphoward Would you mind steelmanning Yudkowsky a bit? Self-instruct is already a thing. People use outputs of the current model to train the next model. Toolformer is already a thing. People can send model's outputs to Python or Linux shell. Researchers might want to run
@jeremyphoward @ESYudkowsky’s post is fine. He’s talking about training an AGI. Yes training is just backprop, but that doesn’t mean there isn’t an emergent higher level intelligence. (The whole point is that there would be.) That intelligence can execute during training. (It has to, for
@jeremyphoward Outstanding thread Jeremy.
@jeremyphoward Preach
@jeremyphoward But people have AIs that train automatically as new data comes in. They act in the world, collect new data, and train in a loop
@jeremyphoward The problem was he used jargon incorrectly and that really gets the ire up of the gatekeepers. He actually is, if you take a moment, in theory quite correct. If you replace 'loss' with 'broad array of capability metrics', it makes perfect sense. I get that you shouldn't use
@jeremyphoward Dude watched too many superhero movies where the villain is injected with a super strength serum and then immediately escapes from the hypersecure underground government lab.
@jeremyphoward Not to mention, we are all watching the loss like a hawk, it's our favorite pastime! https://x.com/CSProfKGD/status...
@jeremyphoward the underline in img is for the “atm machines” part of the post. the people retweeting are just dunking on this. this thread is just a lot of cope defending the dunkers. i say this as an anti-yud #fairspeech
@jeremyphoward He is generally not even wrong, and bizarrely he's followed by many computer scientists because they share the same kind of radically fascist alt-right political ideology.
@jeremyphoward Those working in AI and those that run AI companies aren’t necessarily experts on AI or even computing or stats. I believe his reasoning is unexpected loss may indicate unexpected behavior. It makes little sense. Just like the idea of babysitting. It’s a program, stop execution
@jeremyphoward @karpathy has a tweet where he hints a model can understand it is being trained
@jeremyphoward I have to admit I would definitely suspend the training If my logloss suddenly drops and ... becomes negative.
@jeremyphoward good thread!
@jeremyphoward What is loss? I only know LOSS 🤔
@jeremyphoward See, computer devil materializes during training, and like an edgy anime protagonist, uses its intelligence superpower to escape by manipulating humans and breaks the laws of physics. You think you're explaining ML, but really you're just disrespecting EY's religious beliefs.
@jeremyphoward Isn't a key misunderstanding here that sudden drops in the loss are not any more indicative of "emergent" behavior than a steady and consistent decrease in loss?
