@alisawuffles
PhD student at @uwcse @uwnlp
We created SuperBPE🚀, a *superword* tokenizer that includes tokens spanning multiple words. When pretraining at 8B scale, SuperBPE models consistent...