Published: November 24, 2016
31
135
709

3e-4 is the best learning rate for Adam, hands down.

@karpathy (i just wanted to make sure that people understand that this is a joke...)

@karpathy The prophet hath spoken! Follow the gourd! Brian > Adam.

Image in tweet by Andrej Karpathy

@karpathy But I had a ragetweet all typed up and ready to go!

@karpathy you must remember to do a variant of this on April Fool's Day 😂

@karpathy We got the joke all right hahaha

Image in tweet by Andrej Karpathy

@karpathy I have changed lr to 3 e-4 to train my network,waiting for great result then I see this tweet😂

@karpathy lol i thought this was legit?!😋😂

@karpathy 1e-3 is actually too large for some models on some data sets... But not always.

@karpathy Interesting. In fact, as we speak (er ... tweet) am running a ConvNet hyper-parameter search w/ epsilons = [1e-08,0.1,1]

Share this thread

Read on Twitter

View original thread

Navigate thread

1/10