3e-4 is the best learning rate for Adam, hands down.
@karpathy (i just wanted to make sure that people understand that this is a joke...)
@karpathy The prophet hath spoken! Follow the gourd! Brian > Adam.
@karpathy But I had a ragetweet all typed up and ready to go!
@karpathy you must remember to do a variant of this on April Fool's Day 😂
@karpathy We got the joke all right hahaha
@karpathy I have changed lr to 3 e-4 to train my network,waiting for great result then I see this tweet😂
@karpathy lol i thought this was legit?!😋😂
@karpathy 1e-3 is actually too large for some models on some data sets... But not always.
@karpathy Interesting. In fact, as we speak (er ... tweet) am running a ConvNet hyper-parameter search w/ epsilons = [1e-08,0.1,1]


